We Actually Tried a Diagnostic AI on Real Patients. Here's What Happened.

For years, we’ve been hearing about AI that can diagnose you better than a doctor—in theory. Simulated patients, scripted conversations, controlled environments. But the gap between a lab demo and a real clinic is massive. That’s why I was genuinely interested when Google Research and Beth Israel Deaconess Medical Center (BIDMC) published results from a prospective, real-world feasibility study of AMIE, their conversational medical AI.

This isn’t another press release about how AI passed some multiple-choice exam. This is a first-of-its-kind study where AMIE actually talked to real patients, in a real clinic, before they saw their doctor. And the results are worth digging into.

How the Study Worked

AMIE was deployed to take pre-visit clinical histories from patients with new, non-emergency complaints. Patients booked for ambulatory primary care—either in-person or via telehealth—were invited to participate during the booking process. They got plenty of time to review the IRB-approved protocols and were explicitly told that opting out wouldn’t affect their care. That’s the kind of transparency you want to see.

The interaction happened via a secure web-link. Patients chatted with AMIE in a text-based conversation. But here’s the key: a physician was watching the whole thing live via video call with screen-sharing, ready to intervene based on predefined safety criteria. Think of it like a resident doctor taking a history under supervision—except the resident is an AI.

After the chat, AMIE generated a transcript and a summary, which were shared with the patient’s primary care provider (with consent). The idea was to give the doctor a head start on the visit, not to replace them.

What They Found: The Good, The Meh, and The Honest

The study was small—that’s the first thing to note. It’s a feasibility study, not a definitive trial. But the data is still informative.

Patients generally found the AI interaction acceptable. Most reported that the system was easy to use and that they felt comfortable sharing information. That’s higher than I expected, honestly. I’ve seen enough clunky chatbots to be skeptical about patient acceptance, but the numbers here suggest that when done right, people are open to it.

Clinicians who reviewed the summaries found them reasonably accurate. Not perfect, but useful as a starting point. The summaries didn’t miss critical information in most cases, and when they did, the supervising physician caught it. That’s the whole point of the safety net.

There were also some interesting nuances. The AI was better at structured history-taking—symptoms, duration, medications—than at picking up on the more subtle social or emotional cues that a human doctor might notice. That’s not surprising, but it’s a reminder that AI is a tool, not a replacement.

The Safety Net Was Essential

One thing that stood out to me is how seriously they took safety. The overseeing physician wasn’t just a rubber stamp. They had a structured set of criteria for when to intervene: if the AI asked something inappropriate, if the patient seemed distressed, if the conversation went off track. And they did intervene in some cases. The paper doesn’t sugarcoat this—supervision wasn’t just for show.

This approach has been tried before in other clinical AI deployments, but it’s good to see it baked into the study design from the start. If you’re going to put an AI in front of patients, you need a human in the loop who can actually stop the car if it’s about to drive off a cliff.

What This Means for the Future

This study is a step, not a leap. It shows that conversational diagnostic AI can work in a real clinical workflow, at least in a limited, supervised setting. But there are still huge questions: How does it scale? What happens when the supervising physician is managing multiple AI interactions at once? How does it handle patients with complex comorbidities or language barriers?

The paper itself is refreshingly honest about the limitations. It’s a single-center study, a specific patient population, and a very controlled environment. Real-world deployment will be messier.

Still, I’d rather see companies do this kind of rigorous, incremental work than rush to market with half-baked products. The AI hype train has derailed enough times already. This is how you build something that might actually stick.

For now, AMIE stays in the research lane. But it’s a promising sign that we’re moving beyond the simulation stage. The next step? Bigger studies, more diverse populations, and maybe—just maybe—a world where your doctor actually has time to listen because the AI already handled the paperwork.

We Actually Tried a Diagnostic AI on Real Patients. Here’s What Happened.

How the Study Worked

What They Found: The Good, The Meh, and The Honest

The Safety Net Was Essential

What This Means for the Future

Comments (0)