Scenario 5 of 10 · AIGP Scenarios

Bias in a Lending Algorithm — The Credit Limit Disparity

⏱ 12 min 📊 Advanced 🎯 Domain III

In November 2019, tech entrepreneur David Heinemeier Hansson tweeted that Apple Card, issued by Goldman Sachs, offered him 20 times the credit limit given to his wife — despite the couple filing joint tax returns and his wife having a higher credit score. Steve Wozniak reported a similar experience. The New York Department of Financial Services launched an investigation. This incident exposed how algorithmic lending decisions, even when they do not explicitly use gender as a variable, can produce discriminatory outcomes. This scenario reconstructs that crisis from the inside.

Side-by-side comparison showing two identical financial profiles receiving dramatically different credit limit decisions from an AI system

When protected characteristics are encoded through proxy variables, equal inputs can produce unequal outcomes.

The Situation

You are the AI governance manager at Summit National Bank, a regional bank with 3 million customers that launched the Summit Rewards Card eight months ago. The credit limit decisioning is handled by LimitEngine, a gradient-boosted decision tree model developed by Summit's internal data science team.

This morning, a viral social media thread from a customer went public: Dr. Sarah Chen and her husband, both physicians at the same hospital with identical salaries, filed joint tax returns. Dr. Chen received a $12,000 credit limit; her husband received a $48,000 limit. The thread has 2.3 million views and is trending. A Bloomberg reporter is working on a story. The Consumer Financial Protection Bureau (CFPB) has sent a preliminary inquiry letter.

LimitEngine was trained on 5 years of Summit's historical lending data. It does not use gender as an input variable. The model uses 47 features including income, credit score, debt-to-income ratio, credit history length, number of accounts, ZIP code, spending patterns from existing Summit accounts, and industry classification code of the applicant's employer.

Root Cause Analysis — How Bias Enters Without Gender

Your data science team's investigation reveals the mechanism:

Historical lending data encoded past discrimination. Summit's historical data reflected years of lending decisions made by human underwriters who, consciously or unconsciously, approved higher limits for men. The model learned these patterns as predictive signals.

Proxy variable effects. Several features act as gender proxies:

- Industry classification code: Historically, certain industries with higher female representation (education, nursing, social work) were associated with lower credit limits in the training data, even when salaries were equivalent.

- Credit history length: Women are statistically more likely to have shorter credit histories due to historical barriers to independent credit access (prior to the Equal Credit Opportunity Act of 1974, women often needed a male cosigner).

- Spending patterns from existing accounts: Different spending categories correlated with gender and were weighted by the model.

The fairness testing gap. Summit's model validation team tested LimitEngine for overall accuracy, default prediction rates, and regulatory compliance with existing fair lending models. They did not conduct disparate impact testing across gender because gender was not an input to the model. This is the critical governance failure — the team assumed that excluding gender prevented gender discrimination.

Knowledge Check

LimitEngine does not use gender as an input variable, yet produces gender-biased outcomes. Under the Equal Credit Opportunity Act (ECOA) and Regulation B, this situation constitutes:

An acceptable outcome because the model optimizes for creditworthiness

Potential disparate impact discrimination — facially neutral factors produce discriminatory outcomes

A violation only if intentional discrimination can be proven

No violation — the model does not use a prohibited characteristic

Under ECOA and Regulation B, both disparate treatment (intentional discrimination) and disparate impact (facially neutral policies that disproportionately affect protected groups) are prohibited. A model that does not use gender but produces gender-based disparities through proxy variables creates disparate impact liability. The absence of the protected characteristic from input features does not immunize the lender from liability.

Remediation and Regulatory Response

Immediate regulatory response to the CFPB:

- Acknowledge receipt of the inquiry within the required timeframe

- Engage outside counsel with fair lending expertise (firms like Ballard Spahr or Buckley LLP)

- Preserve all model artifacts, training data, validation reports, and decisioning logs

- Prepare for a potential fair lending examination under ECOA, Regulation B, and the Fair Housing Act

Model remediation options:

1. Adversarial debiasing: Modify the training objective to penalize the model for outcomes that correlate with protected characteristics. This can reduce disparate impact but may affect overall model accuracy.

2. Feature removal with proxy analysis: Identify and remove or re-weight features that act as gender proxies. Requires careful analysis to avoid removing legitimate predictive signals.

3. Post-processing calibration: Adjust output credit limits to equalize outcomes across demographic groups. Must be done carefully to avoid reverse discrimination claims.

4. Retraining on curated data: Remove historical lending data that reflects known discriminatory patterns. Create a training dataset that represents fair lending outcomes.

Governance improvements:

- Mandate disparate impact testing across all protected classes before any credit model deployment

- Implement continuous fairness monitoring with automated alerts when disparities exceed defined thresholds

- Require adverse action reason codes to be auditable and explainable to consumers

- Establish a fair lending AI committee with compliance, legal, data science, and business representation

Knowledge Check

Summit's model validation team did not conduct disparate impact testing because gender was not an input to the model. This reflects a fundamental misunderstanding of:

Feature engineering best practices for gradient-boosted models

Model accuracy requirements under SR 11-7

The difference between supervised and unsupervised learning

The distinction between disparate treatment and disparate impact in fair lending law

Disparate treatment is intentional discrimination (e.g., using gender as an input). Disparate impact occurs when facially neutral policies or practices produce discriminatory outcomes, regardless of intent. Removing the protected characteristic from model inputs prevents disparate treatment but does not prevent disparate impact. Fair lending governance requires testing for both.

Knowledge Check

A proposed remediation is to adjust LimitEngine's output credit limits to equalize outcomes across gender groups (post-processing calibration). What is the primary risk of this approach?

It will significantly reduce model accuracy for all applicants

It violates the NIST AI RMF Govern function requirements

It could be challenged as reverse discrimination if it systematically advantages one group

It is technically impossible to implement

Post-processing calibration that adjusts outcomes to equalize across groups can raise legal concerns about reverse discrimination. If the adjustment systematically increases limits for women and decreases them for men, it may itself constitute disparate treatment based on gender. The safer approach is typically to address bias at the data and feature level rather than through output manipulation.

🤖

ChatGPT

online

Our credit limit AI gave a woman a $12K limit and her husband $48K despite identical finances. It's gone viral. The CFPB sent an inquiry. We don't use gender in the model — but the outcomes are clearly biased. What's our regulatory exposure and what should our response team do in the first 72 hours?

This is a serious fair lending situation. Here's your 72-hour action plan: **Regulatory Exposure Assessment:** - **ECOA / Regulation B:** Disparate impact liability is the primary concern. You don't need to have used gender explicitly — proxy effects creating discriminatory outcomes are sufficient for enforcement action. - **CFPB enforcement:** Potential outcomes include consent orders, civil money penalties (up to $1M+ per day for knowing violations), mandatory restitution to affected consumers, and required third-party auditing. - **State AG actions:** Multiple state attorneys general may initiate parallel investigations, especially in states with strong consumer protection statutes. - **Private litigation:** Class action risk from affected consumers is significant given the viral nature of the complaint. **First 72 Hours:** **Hour 0-12:** - Activate incident response team: General Counsel, Chief Risk Officer, Chief Compliance Officer, Head of Data Science - Issue a litigation hold on ALL model artifacts, training data, validation reports, and credit decisioning logs - Engage outside fair lending counsel immediately - Do NOT communicate externally until legal reviews messaging **Hour 12-36:** - Begin emergency disparate impact analysis across gender, race, age, and marital status - Quantify the scope: how many customers may be affected? - Draft CFPB response acknowledging receipt and committing to a timeline - Prepare talking points for Bloomberg — focus on taking the matter seriously and committing to investigation **Hour 36-72:** - Present initial disparate impact findings to executive leadership - Develop interim remediation options (e.g., manual review for applicants in the affected range) - Begin root cause analysis on proxy variables - Consider proactive customer remediation to limit regulatory exposure and demonstrate good faith The Apple Card/Goldman Sachs precedent is directly relevant — that investigation found that even without intentional discrimination, algorithmic disparities constitute regulatory risk.

↻ Replay conversation

📋

Scenario 5 Complete

"Removing protected characteristics from model inputs does not prevent discrimination. Proxy variables can reproduce disparities through facially neutral features. AI governance for lending requires disparate impact testing across all protected classes — testing for disparate treatment alone is insufficient."

Tomorrow — Day 6

Red Teaming a Customer Service LLM

A bank deploys an LLM for customer support. Red team exercises reveal it can be jailbroken to expose system prompts and make unauthorized commitments to customers.

Continue to Day 6 →