We've Been Calling It Clinical Judgment. It's Actually a Workaround
The evidence cycle was optimized for liability, not the patient in the room
Want to support Rewskidotcom Substack?
♻️ Share this post with friends
🫰 Become a paid subscriber
📢 Leave a comment below
The Evidence Cycle Isn’t Broken. It Was Built for Someone Else.
She was 67, Somali-American, managing T2D and CKD on a fixed income with food insecurity that made half the standard recommendations irrelevant the moment I opened my mouth. The guideline sat in my head like a script written for a different patient.
I followed it anyway. Mostly.
That tension between what the evidence says and what the person in front of you actually needs is something every clinician carries. We’ve learned to call it “clinical judgment.” What we don’t say out loud is that we’re compensating for a system that was never designed to answer our actual question.
The RCT Was Always Answering Something Else
Randomized controlled trials are extraordinary instruments. They’re also answering a question that has almost nothing to do with your 4:30 pm appointment.
The RCT asks: does this intervention produce a statistically meaningful effect in a defined population, on average, over a defined period? That’s genuinely useful for regulators, payers, and guideline committees. It is not the question a clinician is asking when they’re trying to figure out whether metformin makes sense for this woman, in this context, with theseconstraints.
The average treatment effect in a trial population is a fiction made useful. It smooths across outliers, enrollment criteria, dropout rates, and populations that look nothing like the patients most of us actually see. The gap between trial eligibility and real-world patients, particularly those with multiple chronic conditions, low income, or non-dominant cultural backgrounds, is not a footnote. It’s the job.
We’ve known this for decades. The machine kept running anyway, because it was never really optimized for the clinician’s question. It was optimized for institutional defensibility.
Why the Cycle Is Slow (And It’s Not About Science)
The standard critique of the evidence cycle is that it’s too slow. AI will fix this, the argument goes, by compressing timelines and accelerating synthesis.
Maybe. But speed doesn’t fix a direction problem.
The evidence cycle moves slowly not because the science is hard, though it is. It moves slowly because the institutions producing it, journals, funding bodies, guideline panels, are optimized for consensus and liability management, not clinical responsiveness. Giving AI to that machine is a little like handing a faster horse to someone headed the wrong direction.
What clinicians actually need isn’t faster guidelines. It’s a learning system that treats the encounter itself as evidence.
What AI Actually Makes Possible
Here’s where this gets genuinely interesting rather than just frustrating. ⚙️
The traditional evidence cycle produces population-level truth and then asks clinicians to translate it downward to individual patients. That translation is where most of the error lives, not in the science itself, but in applying averaged findings to people who weren’t in the study.
AI, particularly the continuous monitoring and adaptive systems coming online, can invert this. Instead of asking what did we learn from a cohort, it can ask what is this patient’s body telling us right now, and how does that compare to what worked for similar presentations in the past.
Individual response data fed back in real time into care decisions is a fundamentally different kind of evidence. It doesn’t replace the RCT. It addresses the question the RCT was never designed to answer.
This is why I keep returning to the architecture question. In The Physician Agent Needs a World Model, I argued that without a patient-level model of how an individual’s health evolves over time, physician agents are just workflow compression, useful, but not clinical reasoning. The evidence question lives downstream of exactly that gap. A system that can model this patient rather than this population doesn’t just move faster through the existing evidence pipeline. It routes around the parts that were always answering someone else’s question.
In The Immortal Ten: Why Healthcare’s Moat Is Finally Cracking, I traced how world models trained on longitudinal data are beginning to make individualized prediction practical at scale. That’s the infrastructure side. This is the epistemological side. What kind of truth are we trying to produce, and for whom?
The Implication Nobody Wants to Say
If real-world, continuous, individualized learning systems are the future of evidence, and I think they are, then the guideline committee is not the right endpoint for AI to report to.
Guideline committees exist to protect institutions. That’s not cynical, it’s structural. Their job is to synthesize evidence into defensible recommendations that can be applied at scale. That’s genuinely valuable work. It’s just not the same as optimizing for the patient in the room.
Building AI tools that report upward to consensus panels rather than to the clinician or patient is a design choice. We should name it as one.
In The Long Tail Isn’t an Edge Case in Healthcare — It’s the Job, I wrote that clinical AI needs to handle the rare and atypical because that’s where the work actually lives. Population-level evidence fails hardest exactly there, in the complex, multi-morbid, socially constrained patients who were never in the trial to begin with. The long tail problem and the evidence problem are the same problem wearing different clothes.
The clinician sitting across from that 67-year-old woman needs a tool that learns from her response to treatment. Not one that tells her, once again, what worked on average for someone else in 2019.
The evidence cycle isn’t broken. It’s doing exactly what it was designed to do. The problem is that what it was designed to do and what medicine actually needs have quietly diverged, and we’ve been papering over that gap with clinical judgment for so long that we’ve stopped noticing it.
AI gives us the opportunity to close that gap rather than keep bridging it.
If the RCT was designed to answer “does this work on average” and AI can answer “does this work for her,” why are we still pointing these tools at the guideline committee instead of the patient in front of you?
Leave a comment. Genuinely curious where you think the line is.




Could you outline the care difference, for an example patient, between RCT pathway and N or 1 pathway?