What AI in healthcare actually looks like right now

I practiced medicine in India for a year, then left the clinic for an MPH/MBA at Washington University. I work on cancer epidemiology at Siteman, build a consumer health product, and run an entirely local AI pipeline that produces long‑form health explainer videos. That last one still startles people.

So when I say AI in healthcare, I mean something very specific. I do not mean an autonomous doctor. I do not mean chatbots replacing nurses. I mean the narrower, harder, more useful work that is actually getting shipped and cleared, the work that sits between what a trained clinician used to do by hand and what an algorithm can now do reproducibly at scale.

What is actually shipping

The FDA maintains a public list of authorized AI/ML‑enabled medical devices. As of 2025 it sat above 1,000 cleared devices, and it is growing. Most of them are not what you see in a demo on Twitter. They are:

Imaging triage and detection. Radiology dominates the list. Stroke detection on CT (Viz.ai), pulmonary embolism detection, diabetic retinopathy screening (Digital Diagnostics / IDx‑DR), breast density assessment.
Cardiology interpretation. ECG algorithms flagging atrial fibrillation, heart failure signals, hyperkalemia signatures. The Mayo Clinic ECG‑AI work is the reference point here.
Pathology. Digital pathology models scoring prostate biopsies (Paige.AI), breast tissue quantification.
Ambient documentation. Scribe products (Abridge, Nuance DAX, Suki) that listen to the visit and generate a draft note. This is the segment that is actually moving revenue at large health systems right now.
Operational tools. Bed‑turn prediction, sepsis early warning, discharge risk scoring. Quiet, high‑leverage, and rarely the part you see in the press.

The pattern: successful AI in healthcare tends to replace a narrow, repetitive cognitive task a clinician already did, not the clinician's judgment. It is boring on purpose.

Why the hyped version keeps failing

The canonical failure mode is the sepsis prediction model. Epic's embedded sepsis model was the poster child for embedded EHR AI for years. In 2021, a JAMA Internal Medicine validation study found it performed substantially worse than Epic's own reported numbers at a large academic health system. It missed real cases and flagged many false positives. The model was still there. It just did not work the way the product page said it did.

That is not a one‑off. The BMJ's 2020 review of COVID‑19 prediction models looked at 232 published models and concluded that essentially none were fit for clinical use. The failure modes were reproducible: non‑representative training data, data leakage, no external validation, calibration drift once deployed.

So when a vendor says "our model has 94% accuracy," the correct next questions are: accuracy on what population, at what prevalence, calibrated when, validated where, drift monitored how. If any of those answers is vague, the number is a marketing number.

Where the useful work actually sits

If I had to draw the map, I would put useful healthcare AI in four regions:

1. Clinician augmentation, not replacement

Ambient scribes are the single biggest live example. A clinician still sees the patient, still makes the decisions, still signs the note. The model just drafts the note from the conversation. This pattern fits because it removes a universally hated documentation burden, it is low‑stakes per interaction, and the clinician always signs off before it becomes part of the record. It is a force multiplier that respects the existing authority structure of medicine.

The same pattern works for coding assistance, draft discharge summaries, and draft patient‑portal replies. The model produces a first pass, the clinician edits and signs.

2. Triage at the top of the funnel

Retinal cameras that screen for diabetic retinopathy in primary care, without an ophthalmologist in the room, are a live example of AI expanding access rather than replacing expertise. IDx‑DR was the first FDA‑authorized autonomous diagnostic in 2018 and the category has grown since. The trick is that these are screening tools, explicitly designed to route patients to humans when uncertainty is high.

3. Population‑level risk stratification

This is the quiet segment. Models predicting 30‑day readmission, inpatient deterioration, sepsis onset, no‑show risk. They rarely make headlines but they are where health systems are saving money and occasionally lives. The honest version ships with calibration plots by subgroup and a plan for revalidation every 6 to 12 months.

4. Research and evidence synthesis

Here is where general‑purpose LLMs are most useful right now, and it is the one I use most. Literature review, clinical guideline synthesis across sources, competitive teardowns of health apps, data quality audits of public datasets, building reference documents grounded in primary sources. This is not medical device territory. It is knowledge work, done faster.

The discipline is what matters. Every claim should cite the primary source, not a secondary summary. Every ambiguity should be flagged, not smoothed over. Evidence grades matter when they exist (USPSTF, GRADE). I have built long‑form reference documents with Claude on topics ranging from hypertension management thresholds to FDA General Wellness guidance and the outputs are only valuable because they are grounded and cited, not because they are produced fast.

An AI‑generated clinical reference without citations is worse than no reference. It is a reference you cannot check.

The regulatory edge nobody wants to talk about

A consumer wellness app that says "this food is good for your heart" sits on one side of a line. The same app saying "this food is good for your hypertension" sits on the other side. The first is FDA General Wellness, unregulated. The second is a medical device claim. The distinction is the difference between a shippable product and an enforcement letter.

The same thing applies to AI features. A model that suggests what a doctor might consider is decision support. A model that tells the doctor what to do or, worse, bypasses the doctor entirely, is Software as a Medical Device (SaMD). The FDA has specific guidance here, and in 2023 they finalized the CDS guidance to clarify when software falls under device regulation.

If you are building, this is not a small detail. It determines whether you can ship in 6 weeks or whether you need a 510(k) submission. My working rule: read the FDA warning letter archive for your category before you ship. They are public. The specific language the FDA objects to is instructive in a way that no summary document is.

What actually makes AI in healthcare work

Across the examples above, the shipped and working ones share a few traits:

Narrow problem definition. "Detect diabetic retinopathy from a fundus photo." Not "be a doctor."
Labeled training data with provenance. You can trace where the labels came from. Who labeled them. What their agreement rate was.
External validation on a real population. Not the same population the model was trained on, measured at the same prevalence, in the same setting.
Calibrated outputs, not just ranked ones. "This patient has a 7% probability of X" beats "this patient is in the top decile for X" for most clinical decisions.
A human in the loop, explicitly. Who sees the output. What they do with it. How their override is logged.
A monitoring plan. Drift happens. Populations change. Coding practices change. A model that worked in 2023 might be miscalibrated by 2026.

Those are boring requirements. They are the requirements. The shortest path to a working AI product in healthcare is to take them seriously from the first commit, not the first audit.

Where I think this goes next

Three bets, at decreasing confidence.

Ambient documentation becomes table stakes. By the end of 2026 I would bet that most large academic and community health systems have a scribe product contract in place. The remaining question is which vendor, not whether. The clinician time recovered is too large to ignore and the competitive pressure on the health systems without it is real.

LLMs become a layer in the EHR, not a feature. Not a chatbot in the corner. A substrate the EHR uses to surface what a clinician needs: relevant prior notes, relevant lab trends, relevant guideline reminders. Epic's partnership with Microsoft is a signal of that direction; the product has to get quieter and more useful for it to actually stick.

Consumer health AI stays legally careful or gets letters. Condition‑aware, evidence‑grounded consumer tools can ship under General Wellness if the claims are disciplined. Most of the ones I see in app stores are not that disciplined. Expect the enforcement signal to rise with the noise level.

What to do about it

If you are a clinician: learn to evaluate an AI claim the way you evaluate a drug trial. Who was the population. What was the prevalence. What is the calibration. What is the monitoring plan. If those answers are missing, the product is not ready for your workflow.

If you are a builder: pick a narrow problem with a labeled dataset and a human in the loop. Ship something boring that works before you try to ship something interesting.

If you are a buyer at a health system: ask for the external validation, the calibration plot, the drift monitoring plan, and the subgroup performance. If those are not offered, the answer is probably no.

The corridor where AI in healthcare actually works is narrower than the pitch decks suggest. It is also wider than the cynics think. The useful part is figuring out which side of each specific question you are on.

What AI in healthcare
actually looks like right now.

What is actually shipping

Why the hyped version keeps failing

Where the useful work actually sits

1. Clinician augmentation, not replacement

2. Triage at the top of the funnel

3. Population‑level risk stratification

4. Research and evidence synthesis

The regulatory edge nobody wants to talk about

What actually makes AI in healthcare work

Where I think this goes next

What to do about it

Further reading

What AI in healthcareactually looks like right now.

What is actually shipping

Why the hyped version keeps failing

Where the useful work actually sits

1. Clinician augmentation, not replacement

2. Triage at the top of the funnel

3. Population‑level risk stratification

4. Research and evidence synthesis

The regulatory edge nobody wants to talk about

What actually makes AI in healthcare work

Where I think this goes next

What to do about it

Further reading

What AI in healthcare
actually looks like right now.