In November 2019, tech entrepreneur David Heinemeier Hansson tweeted that Apple Card, issued by Goldman Sachs, offered him 20 times the credit limit given to his wife — despite the couple filing joint tax returns and his wife having a higher credit score. Steve Wozniak reported a similar experience. The New York Department of Financial Services launched an investigation. This incident exposed how algorithmic lending decisions, even when they do not explicitly use gender as a variable, can produce discriminatory outcomes. This scenario reconstructs that crisis from the inside.
You are the AI governance manager at Summit National Bank, a regional bank with 3 million customers that launched the Summit Rewards Card eight months ago. The credit limit decisioning is handled by LimitEngine, a gradient-boosted decision tree model developed by Summit's internal data science team.
This morning, a viral social media thread from a customer went public: Dr. Sarah Chen and her husband, both physicians at the same hospital with identical salaries, filed joint tax returns. Dr. Chen received a $12,000 credit limit; her husband received a $48,000 limit. The thread has 2.3 million views and is trending. A Bloomberg reporter is working on a story. The Consumer Financial Protection Bureau (CFPB) has sent a preliminary inquiry letter.
LimitEngine was trained on 5 years of Summit's historical lending data. It does not use gender as an input variable. The model uses 47 features including income, credit score, debt-to-income ratio, credit history length, number of accounts, ZIP code, spending patterns from existing Summit accounts, and industry classification code of the applicant's employer.
Your data science team's investigation reveals the mechanism:
Historical lending data encoded past discrimination. Summit's historical data reflected years of lending decisions made by human underwriters who, consciously or unconsciously, approved higher limits for men. The model learned these patterns as predictive signals.
Proxy variable effects. Several features act as gender proxies:
- Industry classification code: Historically, certain industries with higher female representation (education, nursing, social work) were associated with lower credit limits in the training data, even when salaries were equivalent.
- Credit history length: Women are statistically more likely to have shorter credit histories due to historical barriers to independent credit access (prior to the Equal Credit Opportunity Act of 1974, women often needed a male cosigner).
- Spending patterns from existing accounts: Different spending categories correlated with gender and were weighted by the model.
The fairness testing gap. Summit's model validation team tested LimitEngine for overall accuracy, default prediction rates, and regulatory compliance with existing fair lending models. They did not conduct disparate impact testing across gender because gender was not an input to the model. This is the critical governance failure — the team assumed that excluding gender prevented gender discrimination.
Immediate regulatory response to the CFPB:
- Acknowledge receipt of the inquiry within the required timeframe
- Engage outside counsel with fair lending expertise (firms like Ballard Spahr or Buckley LLP)
- Preserve all model artifacts, training data, validation reports, and decisioning logs
- Prepare for a potential fair lending examination under ECOA, Regulation B, and the Fair Housing Act
Model remediation options:
1. Adversarial debiasing: Modify the training objective to penalize the model for outcomes that correlate with protected characteristics. This can reduce disparate impact but may affect overall model accuracy.
2. Feature removal with proxy analysis: Identify and remove or re-weight features that act as gender proxies. Requires careful analysis to avoid removing legitimate predictive signals.
3. Post-processing calibration: Adjust output credit limits to equalize outcomes across demographic groups. Must be done carefully to avoid reverse discrimination claims.
4. Retraining on curated data: Remove historical lending data that reflects known discriminatory patterns. Create a training dataset that represents fair lending outcomes.
Governance improvements:
- Mandate disparate impact testing across all protected classes before any credit model deployment
- Implement continuous fairness monitoring with automated alerts when disparities exceed defined thresholds
- Require adverse action reason codes to be auditable and explainable to consumers
- Establish a fair lending AI committee with compliance, legal, data science, and business representation