Google and NHS test AI for breast cancer screening — here’s what they found

Google Research has been working on AI for breast cancer screening for a while now, and their latest results — published this month in Nature Cancer — come from a serious collaboration with the UK’s National Health Service. The two companion studies look at how an AI system performs both as a standalone reader and as part of the double-read workflow that NHS screening programs rely on.

Let me be upfront: the UK is in a bind with breast cancer screening. The NHS Breast Screening Programme uses a double-read system — two human readers look at each mammogram, and if they disagree, an arbitration panel steps in. It’s thorough, but there’s a 30% shortfall of clinical radiologists, projected to hit 40% by 2028. That’s not sustainable. So the question isn’t whether AI can help — it’s whether it can help safely and effectively within existing workflows.

The first study is a two-phase affair. Phase 1 is a retrospective evaluation of the AI system’s standalone performance using mammograms from 125,000 women across five NHS screening services. That’s a big dataset, and importantly, the ground truth included a 39-month follow-up window to catch interval cancers and next-round cancers that wouldn’t have been visible at screening time. That’s a rigorous bar, and I appreciate that they didn’t just compare the AI to the initial human read — they looked at lesion-level localization too, making sure the AI was actually identifying the right spot in the breast, not just picking up on spurious correlations.

The results? The AI matched or exceeded the sensitivity and specificity of the first human reader across all five services. That’s impressive, but it’s retrospective data. Phase 2 was a prospective deployment study — actually plugging the live AI system into real clinical workflows at three NHS sites to see if the integration worked without breaking anything. No surprise: it was technically feasible, but the paper notes challenges around data governance, IT infrastructure, and workflow integration. Anyone who’s worked in a hospital knows these are the real bottlenecks, not the model performance.

The second study is where things get more interesting for actual practice. It’s an end-to-end reader study comparing the current double-read-plus-arbitration process to one where the AI acts as the second reader. They used 4,832 screening cases from one NHS service and had 12 radiologists read them both ways. The AI-as-second-reader approach was non-inferior on cancer detection rate and actually reduced the arbitration rate — meaning fewer cases needed that third review. That’s a workload win. The false positive rate was also lower, which is good because false positives cause unnecessary anxiety and follow-up procedures.

Now, there are caveats. The study was retrospective in the sense that the AI wasn’t making real-time decisions during actual screening — it was simulated. And the sample size, while decent, isn’t large enough to prove the system works in prospective clinical practice. The authors are honest about this: “additional work is needed.”

But here’s my take: this is one of the more rigorous evaluations I’ve seen for AI in screening. The multi-site design, the long follow-up for ground truth, the lesion-level analysis — they checked the boxes that a lot of earlier studies skipped. The fact that the AI could reduce arbitration workload is a big deal, because arbitration is slow and expensive. If you can keep detection rates the same while reducing the number of cases that need a third human reader, that’s a direct path to easing the radiologist shortage.

I also like that they didn’t try to sell the AI as a replacement. It’s a second reader. That’s a smart framing. Radiologists are skeptical of black-box systems that claim to replace them, but a tool that helps with the boring, repetitive work? That’s something they’ll actually use.

Will this get deployed across the NHS? The tech side seems ready, but the organizational side is a mess. Different screening services have different workflows, different IT systems, different local protocols. The study had to set separate AI operating points for each service just to account for population differences. That’s not a dealbreaker, but it means scaling up isn’t a one-size-fits-all operation.

Still, this is progress. The AIMS study (Artificial Intelligence in Mammography Screening) is ongoing, and I expect we’ll see more real-world data in the next year or two. If the results hold, AI-assisted screening could become standard in the UK within a decade. That’s not bad for a field that was mostly academic speculation five years ago.

Google and NHS test AI for breast cancer screening — here’s what they found

Comments (0)