Scenario 1 of 10 · AIGP Scenarios

Healthcare AI Bias Audit — Racial Disparities in Patient Triage

⏱ 12 min 📊 Advanced 🎯 Domain I

In 2019, a landmark study published in Science revealed that an algorithm used by Optum to manage the care of roughly 200 million patients in the United States systematically discriminated against Black patients. The algorithm used healthcare spending as a proxy for health need — but because Black patients historically had less access to care and lower spending, the system scored them as healthier than equally sick white patients. At a given risk score, Black patients were considerably sicker than white patients. This scenario places you inside a similar crisis.

Diagram showing a hospital AI triage system with diverging patient pathways based on biased risk scoring

When proxy variables encode historical inequities, AI systems can perpetuate and amplify discrimination at scale.

The Situation

You are the AI governance lead at MedFirst Health System, a network of 12 hospitals across the Southeastern United States. Six months ago, MedFirst deployed TriageAI, a machine learning system that scores incoming emergency department patients on a 1-100 acuity scale. The score determines how quickly a patient is seen and what resources are allocated.

A routine internal audit has uncovered alarming findings: Black patients receive acuity scores that are on average 8.3 points lower than white patients presenting with identical symptoms and vital signs. The disparity means Black patients wait an average of 23 minutes longer to be seen. Two adverse patient outcomes in the past quarter may be linked to delayed triage. A journalist from STAT News has contacted your communications team requesting comment.

The AI system was developed by a third-party vendor, HealthScore Analytics, and trained on five years of historical patient data from MedFirst's own electronic health records. The vendor claims the model passed all standard performance benchmarks during validation.

Governance Failures — What Went Wrong

Several governance breakdowns enabled this outcome:

No pre-deployment bias testing across protected classes. MedFirst's procurement process evaluated TriageAI on aggregate accuracy metrics (AUC, sensitivity, specificity) but never required disaggregated performance analysis across racial groups. The AIGP Body of Knowledge emphasizes that aggregate metrics can mask significant subgroup disparities — a concept known as the fairness-accuracy tradeoff.

Proxy variable contamination. The model used ZIP code, insurance type, and historical visit frequency as features. These variables are strongly correlated with race due to residential segregation and disparities in insurance coverage. The vendor did not conduct a proxy analysis or document known correlations in the model card.

Inadequate vendor due diligence. MedFirst did not require HealthScore Analytics to provide a model card, datasheets for the training data, or documentation of bias testing methodology. The vendor contract contained no provisions for algorithmic auditing or performance guarantees across demographic groups.

No ongoing monitoring for drift or disparate impact. Once deployed, no mechanism existed to continuously monitor TriageAI's scores for demographic disparities. The bias was only discovered during a scheduled internal audit — six months after deployment.

Knowledge Check

The TriageAI system used ZIP code and insurance type as features, which are strongly correlated with race. In AI governance terminology, these variables are best described as:

A

Proxy variables that can encode protected characteristics and require careful analysis

B

Confounding variables that should be removed from all models

C

Redundant features that reduce model accuracy

D

Legitimate business variables that are always permissible to use

Proxy variables are features that correlate with protected characteristics like race, even when those characteristics are not directly included. The AIGP exam tests the understanding that simply removing race from a model does not prevent discrimination — proxy variables can reproduce the same disparities. The correct governance approach is to conduct proxy analysis and test for disparate impact, not necessarily to remove all correlated variables.

The Remediation Plan

As the AI governance lead, you must now lead a cross-functional response. Here is the framework:

Immediate actions (24-48 hours):

- Suspend TriageAI and revert to the previous manual triage protocol

- Notify the Chief Medical Officer and legal counsel of the adverse findings

- Preserve all model artifacts, logs, and audit data for potential regulatory or legal proceedings

- Prepare a holding statement for the STAT News inquiry

Short-term remediation (1-4 weeks):

- Commission an independent third-party algorithmic audit (firms like ORCAA or O'Neil Risk Consulting)

- Conduct a root cause analysis on the two adverse patient outcomes

- Review all vendor contracts for algorithmic accountability provisions

- File any required incident reports with relevant state health regulators

Long-term governance improvements:

- Implement mandatory disaggregated performance testing before any clinical AI deployment

- Require model cards and bias documentation from all AI vendors

- Establish continuous fairness monitoring dashboards for deployed AI systems

- Create an AI ethics review board with clinical, technical, legal, and patient advocacy representation

Knowledge Check

After discovering racial bias in the triage AI, the governance team's FIRST priority should be:

A

Retrain the model with debiased data

B

Negotiate stronger contractual terms with the vendor

C

Suspend the system and revert to the previous triage process to prevent ongoing harm

D

Commission an independent algorithmic audit to determine root cause

When an AI system is actively causing harm — particularly in a healthcare setting where patient safety is at risk — the immediate priority is to stop the harm. Suspending the system and reverting to the prior process prevents further disparate impact while the root cause is investigated. Auditing, retraining, and contract renegotiation are all important but come after the immediate risk is mitigated.

Knowledge Check

MedFirst evaluated TriageAI using aggregate accuracy metrics (AUC, sensitivity, specificity) that showed strong performance. This approach failed because:

A

Aggregate metrics can mask significant performance disparities across demographic subgroups

B

The vendor intentionally manipulated the test results

C

These metrics are unreliable for healthcare applications

D

The metrics were calculated on too small a sample size

This is a core AIGP concept. A model can achieve excellent aggregate performance while performing very differently across subgroups. Disaggregated evaluation — breaking down metrics by demographic group — is essential for detecting bias. The NIST AI RMF and EU AI Act both emphasize the need for subgroup analysis in high-risk AI systems.

🤖

ChatGPT

online

I'm the AI governance lead at a hospital network. We just discovered our triage AI has an 8.3-point racial bias gap affecting Black patients. Two adverse outcomes may be linked. A journalist is asking questions. What framework should I use to structure our response — and what are our legal exposures?

This is a serious situation that requires immediate, structured action across multiple dimensions. Here's a framework: **1. Immediate Harm Mitigation (NIST AI RMF GOVERN and MANAGE functions)** - Suspend the system immediately. Patient safety is the non-negotiable priority. - Document everything — model artifacts, decision logs, audit findings. Litigation holds may be necessary. **2. Legal Exposure Analysis** - **Title VI of the Civil Rights Act**: Hospitals receiving federal funding (Medicare/Medicaid) are prohibited from discrimination. Algorithmic bias causing disparate impact likely triggers Title VI liability. - **State medical malpractice**: If adverse outcomes are causally linked to delayed triage due to biased scoring, malpractice claims are plausible. - **Section 1557 of the ACA**: Prohibits discrimination in healthcare programs receiving federal financial assistance. - **FTC Act Section 5**: If the vendor marketed the system as unbiased, there may be deceptive practices exposure. **3. Regulatory Reporting** - Check state-specific incident reporting requirements for patient safety events. - If CMS Conditions of Participation are implicated, report through appropriate channels. **4. Stakeholder Communication** - For the journalist: acknowledge awareness, describe corrective actions taken, avoid admitting legal liability. Have legal counsel review all statements. - For patients: consider notification obligations, especially for the two adverse outcome cases. **5. Governance Remediation (ISO 42001 aligned)** - Root cause analysis → updated procurement standards → continuous monitoring → independent audit cycle. The key AIGP principle here: governance is not just prevention — it's also preparedness and response.

↻ Replay conversation

📋

Scenario 1 Complete

"Aggregate accuracy metrics can mask dangerous subgroup disparities. AI governance in healthcare requires pre-deployment bias testing across protected classes, proxy variable analysis, strong vendor accountability, and continuous fairness monitoring — because when AI fails in clinical settings, patients are harmed."

Tomorrow — Day 2

EU AI Act Compliance for a Chatbot Company

A European SaaS company must classify its customer service chatbot under the EU AI Act risk tiers and implement compliance measures before the enforcement deadline.

Continue to Day 2 →