How the data broker industry collects, packages, and sells everything about you — and why the system is designed to make sure you never get it back
Synexmedia.com | Cybercrime & Digital Privacy Series
Say you wake up tomorrow morning and check the weather on your phone.
The app is free. The forecast is accurate. You never think about what you handed over in exchange for knowing whether to bring an umbrella. Something did change hands, though. Something more valuable than anything you'd find in your wallet, and you gave it away with the casual indifference most people reserve for signing a restaurant receipt.
This is where the story begins. Not with hackers in dark rooms. Not with foreign intelligence agencies running operations that required court orders and congressional hearings to expose. This story begins with a Tuesday morning weather check and an industry so ordinary, so embedded in daily commercial life, that most people pass through it the way they pass through automatic doors — never registering it at all.
The industry is called data brokerage. According to multiple market research estimates, including those from Grand View Research, Market Research Future, and Maximize Market Research, it generates between $270 and $325 billion globally each year. Those figures should be read as approximations: different firms define the industry's borders differently, with some counting broader advertising technology and marketing analytics revenue alongside the core business of collecting, packaging, and selling personal dossiers. Even at the low end, the number dwarfs the global music industry and most of professional sport.
You have never heard of most of the companies doing it. They have heard of you.
The Roots: Mailing Lists and Credit Files, 1960s–1980s
The data broker industry did not spring from the internet age. It has roots going back to at least the 1960s, when American companies began maintaining computerised records of consumers — first for credit purposes, then for direct mail marketing. Credit bureaus like Equifax, Experian, and TransUnion existed long before email, let alone smartphones. They compiled financial histories: who paid their bills, who didn't, who had defaulted on a loan.
That was the beginning. The information was supposed to stay in a narrow lane — credit decisions, nothing else. It didn't.
By the 1980s, companies like Acxiom had built what the industry called "marketing databases": vast collections of information about American households, drawn from magazine subscriptions, mail-order catalogues, warranty cards, sweepstakes entries, and whatever public records they could get their hands on. Acxiom, founded in Arkansas in 1969, would eventually claim the largest commercial database of consumer information in the world. The concept was simple and enormously profitable. Marketers want to reach specific kinds of people. Brokers can find them.
The public records pipeline opened early and stayed open. Property deeds, tax assessments, court filings, marriage and divorce records, bankruptcy petitions — all of it publicly filed in the United States, all of it available for bulk purchase. States began selling voter registration rolls to political campaigns and commercial buyers. The federal government, through the Driver's Privacy Protection Act of 1994, permitted fourteen categories of "permissible use" for state DMV data (names, addresses, dates of birth, vehicle information), and states took the revenue gladly. At least twenty-three American states collected a combined $282 million in a single fiscal year from selling driver records. Georgia earned $53 million. Florida collected $77 million. Michigan brought in $81 million. The buyers were data brokers: LexisNexis Risk Solutions, Experian, Acxiom, Thomson Reuters.
The government was selling its citizens, openly and legally, and nobody much noticed.
The Internet Changes the Scale, 1990s–2000s
The internet did not create the data broker industry. It made it incomprehensibly larger.
By the mid-1990s, websites were tracking visitors through small text files called cookies — identifiers stored in a browser that allowed a site to recognise a returning user. Advertisers saw the potential immediately. If you could follow a person from one website to another, noting what they looked at, what they lingered over, what they clicked on, you could build a behavioural profile without ever asking their name.
Third-party cookies, placed not by the website you were visiting but by the advertising technology embedded within it, spread across the web. By the time most people understood what a cookie was, their browsing history had already been assembled, categorised, and sold dozens of times over.
Then came the targeting. Advertisers had always wanted specific audiences. Television, radio, and print were blunt instruments — you broadcast to everyone and hoped the right person was watching. The internet offered something surgical. You didn't need to reach everyone; you needed the forty-five-year-old homeowner in a mid-sized city who had recently searched for refinancing options and whose purchase history suggested above-average income. The data to make that possible came from brokers, and the market for it was enormous.
The Smartphone Arrives, 2007–Present
The shift from desktop to mobile computing changed the data collection business in a way that most people still haven't fully absorbed.
A smartphone is not a telephone that also surfs the web. It is a tracking device that also makes calls. It knows where you are, down to several metres, every moment you carry it. It knows when you're moving and when you're still, what speed you're travelling, which building you entered and how long you stayed. It connects to Wi-Fi networks and Bluetooth devices that help triangulate position indoors, in places where GPS alone struggles. It contains sensors for motion, ambient light, and magnetic fields. It carries a microphone and two cameras.
And you gave all of that to those apps. For free.
The mechanism by which app data became broker data is called a software development kit, or SDK. Think of an SDK as a piece of ready-made code that an app developer drops into their product to save time — code that handles analytics, advertising, or location services. Some of these SDKs also quietly send data to a third party. The developer gets a useful feature, or a small payment. The third party gets a continuous stream of location data from every user who downloads the app.
Per FTC enforcement filings, one prominent company identified as the second-largest location data collector in the United States paid app developers roughly three cents per user per month for the right to embed its tracking code. In return, it received over ten billion location data points every day: GPS co-ordinates, timestamps, device identifiers. These it sold on to clients including defence contractors. The apps carrying that code included weather applications, prayer-time tools, flashlight utilities, and Muslim religious apps with tens of millions of downloads combined. Users had no idea. Why would they? The apps worked exactly as advertised.
Another company embedded its tracking code in over 3,400 publisher integrations, offering free analytics tools to developers in exchange for the right to commercialise the collected data. A third offered app developers between $12,000 and $1 million annually for continuous server-to-server location transfers — an arrangement that circumvented the restrictions Apple and Google had built into their app stores to limit such collection.
The category of apps most likely to be carrying hidden tracking code is not what most people would guess. Not games. Not social media, where the data-for-service exchange feels somehow intuitive. The biggest offenders are utilities. Weather apps. Flashlight apps. Prayer tools. Period trackers. Things you use for sixty seconds and forget you've installed. Things you never audit.
That's not an accident. The most quietly used apps are the hardest to scrutinise.
What Your Grocery Purchases Reveal
Every time you tap your loyalty card at the supermarket, you are making a transaction that has nothing to do with the discount on your cereal.
Kroger, the largest supermarket chain in the United States, runs a loyalty programme with roughly 63 million members. Ninety-six percent of all purchases at its stores are linked to a loyalty card. A May 2025 Consumer Reports investigation found that Kroger collects not only your complete, time-unlimited purchase history, but also your precise in-store location via GPS and Bluetooth beacons, your credit and debit card numbers, and your mobile advertising ID. That last item is the same identifier used to track your phone across apps. The company then purchases additional data from third-party brokers to enrich those profiles with race and ethnicity estimates, financial indicators, employment status, and online browsing activity.
Per the same Consumer Reports investigation, Kroger's internal marketing division, called 84.51°, generates approximately $527 million annually from this data. "Alternative profit" businesses of this kind account for more than 35 percent of Kroger's net income. The food is almost beside the point.
CVS, Walgreens, and Target run similar programmes. Target became well-known in the early 2010s for using purchase pattern analysis to infer likely pregnancies from specific product combinations (unscented lotions, certain vitamins, cotton balls) and began delivering pregnancy-related advertising before those customers had told anyone they were pregnant.
None of this data is covered by health privacy laws, because it was collected by a retailer, not a healthcare provider. The law has a gap wide enough to drive a truck through, and the industry drives through it every day.
The Real-Time Bidding Machine
Here is where the scale becomes genuinely hard to comprehend.
Every time you visit a website that carries advertising, something happens in the milliseconds between your browser requesting the page and the page appearing on your screen. Your device sends out a bid request containing information about you: your approximate location, your device type, your browsing history, interests inferred from prior behaviour, and a unique identifier tied to your device or browser. That signal goes to an auction platform. Advertisers bid for the right to show you their ad. One wins.
But all of the bidders received your data, whether or not they won.
This system is called real-time bidding, or RTB. According to the Irish Council for Civil Liberties' 2022 analysis, which drew on data provided by Google and Microsoft and which explicitly excludes Facebook and Amazon (making its figures conservative floor estimates), the average American consumer's information is exposed through these auctions 747 times per day. That's 294 billion individual data broadcasts, daily, in the United States alone. Google's system routes each one to 4,698 companies.
The Irish Council called RTB "the biggest data breach ever recorded." What makes that framing striking is the implication — a breach usually means something went wrong, that security failed, that someone broke in. RTB is a breach in which nothing went wrong. It functions exactly as designed. The data flows out because the system was built to make it flow. There is no security to fail, because there was never intended to be any.
How Scattered Pieces Become a Complete Portrait
What a data broker does with all of this raw material is not simply store it. The business is to make it useful. Scattered pieces about millions of people have to be assembled into coherent, searchable portraits of individuals. This process is called identity resolution.
There are two approaches. Deterministic matching connects pieces of data using exact identifiers: the same email address appearing in a loyalty programme and a shopping app, or the same phone number appearing in a court record and a credit application. It's precise but limited to cases where the same identifier genuinely appears in multiple sources.
Probabilistic matching fills the gaps. Statistical models analyse patterns (same device type, same IP address block, same time zone, similar browsing behaviour at consistent hours) and calculate the likelihood that two separate data points belong to the same person. Less certain, but far wider in reach. Commercial identity resolution platforms combine both approaches, building what the industry calls "identity graphs" — connection maps linking billions of identifiers into unified individual profiles: email addresses, phone numbers, postal addresses, browser cookies, device IDs, all threaded together under a single persistent record.
One of the dominant identity resolution companies claims a 98.2 percent match rate globally, based on its own marketing materials.
Device fingerprinting works differently still. Instead of relying on any stored identifier, it constructs a unique signature from the characteristics of your device: the specific combination of graphics card model, installed fonts, screen resolution, browser plugins, audio processing behaviour, and time zone. Canvas fingerprinting, which exploits tiny differences in how different graphics hardware renders images, can uniquely identify roughly 60 percent of users on its own. Combined with audio fingerprinting and graphics API data, identification accuracy exceeds 99 percent. This fingerprint survives clearing your cookies. It survives incognito mode. It survives a VPN.
The industry also relies heavily on hashed email addresses — a technique in which your email is converted to a scrambled string of characters before being passed between platforms, creating an appearance of anonymity. The Federal Trade Commission stated in 2024 that hashed email addresses are not anonymous. Researchers at Princeton's Centre for Information Technology Policy demonstrated that commercial services can reverse-identify hashed emails for four cents each. The hash looks private. It isn't.
The Consumer-Facing Storefront, Early 2000s–Present
The technical infrastructure described above is the wholesale side of the business. The retail end, where anyone with a browser and a credit card can shop, is the people-search industry.
Sites operating under names like Spokeo, BeenVerified, TruePeopleSearch, Whitepages, and Intelius serve as consumer-facing shopfronts for data collected and resold through the broker supply chain. Several of the most prominent return a person's name, approximate age, address history, known relatives, and phone numbers entirely for free. No account required. No stated purpose. No verification of who is asking or why.
Trial memberships at the major services run from under one dollar to a few dollars, and unlock court records, marriage and divorce filings, estimated income and net worth, property ownership, vehicle records, and social media accounts. A complete dossier on a private individual (home address, family members' names and ages, estimated household income, vehicle registrations, criminal history, prior addresses) can be assembled in under thirty minutes for less than thirty dollars.
Acxiom's core consumer database documents over 10,000 unique data attributes per person across 260 million Americans and 2.6 billion people globally. TransUnion's investigative platform holds over 100 billion data points covering more than 95 percent of the American population. These aren't outliers. They are the mainstream.
The Federal Trade Commission fined one prominent people-search company $800,000 in 2012 for marketing personal profiles to employers and recruiters in violation of the Fair Credit Reporting Act, which requires specific legally permissible purposes before consumer reports can be used in employment decisions. The company paid the fine. It continues to operate. It continues to sell profiles.
How This Data Gets Used Against You
Up to this point, this has been a story about commerce. Legal commerce, largely unregulated, conducted openly and at enormous scale.
Now it becomes something else.
The same infrastructure that lets a shampoo company reach consumers with fine hair also lets anyone who wants to find someone's home address find it. The same profiling that tells a pharmaceutical company which consumers are most likely to respond to a drug advertisement also tells a fraud operation which elderly consumers are most likely to respond to a fake prize notice.
The fraudulent use of data broker infrastructure is documented. The United States Department of Justice secured criminal convictions against executives at data brokerage firms who knowingly sold lists of vulnerable consumers, primarily elderly Americans, to mail fraud scammers. The scheme operated for nearly a decade. The brokers involved had developed algorithms specifically to identify which consumers had previously responded to fraudulent solicitations, then resold those lists as optimised targeting tools for new scammers. Thirty million people's information was sold to fraud operators over that period. Victims were defrauded repeatedly. Some were defrauded more than twenty times by the same scheme.
The data that made those schemes work: age estimates, inferred health conditions, purchasing behaviour suggesting financial vulnerability, prior responsiveness to unsolicited offers. All of it is still collected and sold today. Without restriction.
The doxxing risk operates through the same pipeline, just at a smaller individual scale. Personal information pulled from people-search sites can identify where someone lives, who their family members are, what vehicles they drive, and what their property looks like. The barrier to obtaining this information is low enough to clear in an afternoon. It flows from government records and commercial databases to broker profiles to consumer-facing search sites, with no meaningful controls on who accesses it or why.
Domestic violence organisations have flagged this problem for years. Address confidentiality programmes, which allow survivors to register a substitute address on government documents, are systematically undermined by data brokers that continuously harvest property records, utility company filings, and postal change-of-address notifications. The re-aggregated information reappears within weeks of any removal request.
The Opt-Out That Doesn't Work
Most data broker companies offer an opt-out mechanism. This is the industry's standard response to privacy concerns, and it deserves a straightforward assessment.
It doesn't work.
Not because the forms are broken, though compliance varies. Not because the companies are outright lying about processing removals. It doesn't work because of a structural feature of how these databases are maintained.
Data brokers refresh their records every two to four weeks. The sources (public records, government databases, partner data feeds from other brokers) continuously generate new information. When you opt out, the most a broker typically does is suppress your current record. The underlying source data remains. The next refresh cycle pulls a new address from a property filing, a new phone number from a voter registration update, a new workplace from a professional directory, and rebuilds your profile from scratch. This cycle, called re-aggregation, repeats within three to twelve months of most removal requests. Sometimes within weeks.
Commercial removal services (DeleteMe, Optery, Kanary, Incogni, Privacy Bee) automate the submission process across hundreds of brokers and resubmit requests on regular schedules. They're not useless. They reduce the density of information available about a person. They don't make it disappear. The industry is too large, the data pipelines too numerous, and the re-aggregation cycle too fast for any service to achieve permanent removal.
There are more than 750 data broker companies operating in the United States. Contacting each one manually and completing each company's specific opt-out process requires an estimated forty to sixty hours of initial effort, then five to ten hours per month of ongoing monitoring. That burden falls entirely on the individual. The companies that created it face no obligation to simplify it.
What the Law Does and Doesn't Cover
The United States has no comprehensive federal privacy law governing data brokers.
The Fair Credit Reporting Act of 1970 regulates credit reporting agencies (Equifax, Experian, TransUnion) and requires that consumer reports be used only for specifically permitted purposes. Data brokers avoid FCRA coverage by including contractual clauses prohibiting buyers from using data for employment, credit, or insurance decisions, then marketing their products as being for "marketing purposes only." The underlying data is often identical. The legal classification is not.
Only four states require data brokers to register. Vermont passed its law in 2018. California enacted the Delete Act in 2023, which also created the Delete Request and Opt-Out Platform — a centralised system allowing Californians to submit a single deletion request reaching all registered brokers simultaneously. That platform opened to consumers in January 2026. Texas's SB 2105 took effect in September 2023. Oregon's HB 2052 became operative in January 2024.
The Consumer Financial Protection Bureau proposed a rule in December 2024 that would have reclassified many data broker activities under the FCRA, subjecting them to the same consumer protections as credit reports. The rule was withdrawn in May 2025.
The Federal Trade Commission has brought enforcement actions against individual operators. In 2024 and 2025, it issued first-of-kind orders banning the sale of sensitive location data, and prohibited the collection of personal data from real-time bidding auctions for non-advertising purposes. In a settlement finalised in January 2026, following an action first announced in January 2025, it banned General Motors and its OnStar division from selling driver behaviour data. That data had been collected from up to eight million vehicles: precise GPS co-ordinates every three seconds, hard braking events, speeding instances, seatbelt usage. It was sold to insurance analytics companies without drivers' knowledge or meaningful consent. One driver discovered her insurance premiums had increased by 80 percent after her vehicle's data was shared 603 times.
Every one of those enforcement actions was the first of its kind. That matters. It means none of it was happening inside an existing regulatory framework. The FTC was writing the rules as it went, one company at a time, in an industry with hundreds of companies.
Can It Still Be Done Today?
Yes.
The techniques described throughout this article (SDK-based location harvesting, loyalty card profiling, real-time bidding data collection, identity resolution, government records aggregation, people-search site compilation) are operational now. Some have been constrained at the margins. Apple's App Tracking Transparency framework, introduced in 2021, required explicit opt-in consent before apps could access the iPhone's advertising identifier. Initial opt-in rates fell to roughly 11 percent before recovering to approximately 50 percent by 2025. A handful of the most egregious location data operators have faced FTC enforcement orders.
The core infrastructure has not changed. The industry's total scale has not decreased.
Nearly 300 companies that should have registered under California's data broker law had not done so, according to a 2025 analysis by the Electronic Frontier Foundation and the Privacy Rights Clearinghouse. The penalties for non-registration in California are $200 per day — an amount treated as a cost of business. A single data licensing contract can easily outweigh years of accumulated fines.
The connected vehicle data pipeline, which extends across dozens of manufacturers beyond the one singled out by regulators, continues generating location and behaviour data at the vehicle level. Smart television manufacturers face active litigation for automated content recognition systems: software that captures screenshots of everything on screen as frequently as every 500 milliseconds, regardless of source (streaming, cable, game console), and uses that data for advertising targeting without meaningful disclosure. The real-time bidding system continues broadcasting personal information to thousands of companies, per user, per day.
The data broker industry was built on a single idea: that information collected for one purpose can be repurposed and sold for any other. That idea has not been legally repudiated in the United States. It remains the operating principle of a $300-billion-a-year industry.
Your Tuesday morning weather check is still on file. So is every store you've visited this year, every route you drove, every search you made, every medication-adjacent purchase you thought was private, and the names and approximate locations of everyone in your household.
That file exists. It's being updated right now. And someone, somewhere, is selling it.
Note for Canadian readers: Data brokers operating in the United States collect information on Canadians for any services accessed through American platforms. Canada's federal privacy law, the Personal Information Protection and Electronic Documents Act, provides a framework for commercial data handling within Canada, but enforcement against foreign data brokers — companies holding Canadian data in American databases — remains limited in practice. The proposed successor legislation, the Consumer Privacy Protection Act, has not yet been enacted.
Behind the Story
Data brokerage is one of those subjects that resists a simple entry point. There is no single dramatic event to build from, no arrest that crystallises the problem. The industry is enormous, diffuse, mostly legal, and completely invisible to the people it documents most thoroughly. That invisibility is, in a way, the whole story.
Research for this piece drew on a body of source material that itself spans decades. The Federal Trade Commission's 2014 report on the data broker industry, Data Brokers: A Call for Transparency and Accountability, remains one of the most detailed official portraits of how these companies operate and what they hold. The FTC's subsequent enforcement actions — against Spokeo, X-Mode/Outlogic, Kochava, Mobilewalla, Gravy Analytics, InMarket, and General Motors — were each reviewed against the original commission filings and press releases rather than secondary accounts.
The industry scale figures ($270–325 billion globally) are drawn from multiple independent market research reports published between 2023 and 2025, including analyses from Grand View Research, Market Research Future, and Maximize Market Research. Because different firms define the boundary of "data broker" differently — some include the full adtech and martech ecosystems, others apply narrower definitions — the range is presented as an approximation rather than a single authoritative figure.
The real-time bidding exposure figures (747 broadcasts per day per American user; 294 billion daily broadcasts in the United States) originate from the Irish Council for Civil Liberties' May 2022 report, The Scale of RTB Data Broadcasts in the US and Europe. That report drew on operational data provided directly by Google and Microsoft, with the explicit caveat that Facebook and Amazon broadcasts were excluded — meaning the figures represent a conservative lower bound. The ICCL is an advocacy organisation, and that context is noted. The figures themselves, however, are based on disclosed industry data rather than estimates.
The Kroger loyalty programme figures, including the 63 million members, 96 percent purchase linkage rate, and the $527 million in annual revenue attributed to the 84.51° marketing division, are drawn from a May 2025 Consumer Reports investigation. That investigation represents some of the most detailed recent reporting on how a major retailer monetises loyalty data, and its findings were verified against Kroger's own public financial disclosures where possible.
The FTC characterisation of X-Mode Social as the second-largest location data company in the United States, and the figure of ten billion-plus daily location data points, comes from the commission's own enforcement filings rather than from the company's marketing materials.
DMV revenue figures for individual states are drawn from an InvestigateTV analysis of public fiscal records published in October 2025, which was itself grounded in state-level freedom of information responses.
The forensic detail on device fingerprinting — canvas fingerprinting identification rates, audio fingerprinting accuracy, the persistence of fingerprints through VPN and incognito mode — is consistent across multiple peer-reviewed and industry sources reviewed during research, including technical documentation from browser fingerprinting research projects and published academic papers on cross-browser tracking.
One editorial decision worth noting: this article does not name the companies that faced criminal prosecution for selling consumer data to fraud operators, or the specific individuals convicted. Those cases are documented in the public record and in the longer research dossier maintained for this series. The decision to omit names here reflects the article's focus on mechanism and structure rather than individual culpability — the system that made those prosecutions necessary is more instructive than the actors within it.
The opt-out figures (forty to sixty hours initial effort, five to ten hours monthly, 750-plus broker companies) are consistent across multiple consumer privacy research sources, including reports from the Privacy Rights Clearinghouse, DeleteMe's published research, and the Electronic Frontier Foundation's 2025 state registry compliance analysis.
Canadian readers will find the legal landscape materially different from what is described for the American market, but the data flows themselves do not respect borders. Any Canadian using an American platform, application, or loyalty programme is generating data that enters the American broker ecosystem under American rules. PIPEDA provides some framework for commercial data handling within Canada; it provides limited practical recourse for data held offshore.
This article is part of The Media Glen's ongoing Cybercrime & Digital Privacy series.
THE MEDIA GLEN | themediaglen.com