Five hundred thousand volunteers handed their health data to British science. Last week, someone listed it on Alibaba.
The medical records of every participant in UK Biobank — the world’s most comprehensive biomedical dataset — were advertised for sale on the Chinese e-commerce platform, technology minister Ian Murray confirmed to MPs on Wednesday. Three separate listings. Zero purchases. All removed, with cooperation from Alibaba and the Chinese government.
This was not a hack. Nobody breached a firewall or exploited a zero-day vulnerability. A legitimate, accredited research institution downloaded the data through UK Biobank’s approved access process — and then someone tried to sell it.
What was listed
The data did not include names, addresses, phone numbers, or dates of birth. UK Biobank has emphasized this repeatedly, and it matters. What it did include: gender, age, month and year of birth, socioeconomic status, lifestyle habits, and biological sample measurements. De-identified, certainly. Useless, absolutely not.
UK Biobank chief executive Professor Sir Rory Collins told participants the listings represented “a clear breach of the contract signed by these academic institutions.” The three institutions, along with the individuals involved, have had their access suspended.
The open-access contradiction
UK Biobank is one of the great success stories of British science. Founded in 2003 with roughly £200 million in taxpayer funding, it recruited 500,000 volunteers aged 40 to 69 between 2006 and 2010. Their genome sequences, brain and body scans, blood samples, and lifestyle information have fueled breakthroughs in dementia, cancer, and Parkinson’s research. Thousands of scientists worldwide have accessed the dataset since it opened in 2012.
The model is deliberately open. Researchers apply, pass a review process, sign contracts, and receive data. Until late 2024, they could download it directly to their own systems. That was the design. It was also the vulnerability.
A Guardian investigation published in March revealed that Biobank data had been inadvertently posted online on dozens of occasions — typically because researchers accidentally uploaded datasets to GitHub alongside their analysis code. Between July and December 2025 alone, UK Biobank issued 80 legal takedown notices to GitHub. Some files contained hospital diagnoses and dates for more than 400,000 participants.
The Alibaba incident is different in degree, not in kind. The same open-access architecture that makes UK Biobank valuable to global research also makes its data portable — and once portable data leaves a controlled environment, no contract or takedown notice can guarantee where it ends up.
Re-identification is not theoretical
UK Biobank maintains that its data cannot be linked back to individuals. The evidence suggests more nuance. The Guardian’s March investigation demonstrated that with only a volunteer’s month and year of birth and knowledge of a major surgery, an external data scientist could pinpoint a specific person’s record — corroborated by five additional diagnoses the volunteer had not disclosed.
Dr Luc Rocher of the Oxford Internet Institute noted that simply knowing a birthday and the date someone broke a leg might be enough to identify a record. “Once identified, that record could reveal sensitive information such as a psychiatric diagnosis, an HIV test result, or a history of drug abuse,” they said.
UK Biobank’s response has been to tell participants not to share health information on public websites — a stance that Prof Felix Ritchie of the University of the West of England called “entirely unreasonable.” As AI tools make cross-referencing fragmented data increasingly trivial, the gap between institutional reassurance and technical reality widens.
The geopolitical undertone
Reform UK deputy leader Richard Tice branded the incident a “China data theft scandal” and demanded that Chinese researchers be excluded from UK Biobank. Murray pushed back, noting that thousands of Chinese researchers have worked with the dataset safely since 2012 and calling Tice’s framing inconsistent with “the seriousness of this particular issue.”
But the underlying tension is real. UK Biobank data was listed on a Chinese platform, the Chinese government helped remove it, and the UK has no enforcement mechanism beyond institutional contracts and diplomatic goodwill. If a researcher anywhere in the world downloads de-identified health data and decides to resell it, the remedies are takedown requests and suspended access — reactive measures after the data has already moved.
What comes next
UK Biobank has temporarily suspended all access to its research platform. It is implementing strict file-size limits on data exports and will monitor file transfers daily. An automated system designed to prevent de-identified data from leaving the platform entirely is expected by the end of 2026.
These are serious steps. They also concede the premise that the previous safeguards were insufficient. The Biobank model — trust researchers, give them data, rely on contracts — assumed good faith and competent data hygiene. The Alibaba listings are what happens when that assumption fails.
The volunteers, it should be noted, are taking this in stride. Guardian columnist and Biobank participant Polly Toynbee told the BBC she was not worried: the data is anonymized, the cause is important, and most volunteers understand the trade-off.
She is probably right that individual harm from this specific incident is minimal. The larger question is structural. Half a million people gave their health data to a system designed to share it widely, under protections that turned out to be reversible. The data is de-identified but detailed, portable but precious, global in reach but governed by local contracts in a borderless digital economy.
UK Biobank is building better locks. The horse is already out.
Discussion (11)