Earl Sweatshirt and The Alchemist Release Album ‘Voir Dire’
In late 2019, security researchers Vinny Troia and Bob Diachenko stumbled upon an open server hosted on Google Cloud Services. The server required no password or authentication, meaning anyone with a web browser could download the entire dataset.
We requested only “current job title” and “company.” PDL returned past employers, personal email hashes, and even inferred seniority scores. This exposed our downstream CRM users to data we never asked for—creating compliance questions under GDPR/CCPA (lawful basis for processing?).
Data enrichment is a double-edged sword. While it provides the context businesses crave, it increases the "blast radius" of any potential data breach. For PDL customers, the responsibility lies in recognizing that the more they enrich their data, the more valuable—and dangerous—that data becomes to attackers.
Because PDL enriches so aggressively, our own customer records became a liability. We accidentally exposed inferred data (e.g., “likely income range”) to sales reps who had no business seeing it. Worse, PDL doesn’t offer granular field-level suppression. You either accept their full enrichment payload or build a custom middleware filter yourself.
The phrase refers to a massive October 2019 security incident where 1.2 billion personal data records—totaling 4 terabytes—were discovered on an unprotected Elasticsearch server.
Data enrichment is the process of combining existing customer data with external data sources to enhance its value and usability. This can include appending demographic, firmographic, or behavioral data to customer records, providing a more complete and accurate picture of the customer. Data enrichment can be used to update existing customer data, validate data quality, and even identify new business opportunities.