Understand where bias enters the ML pipeline — from data collection to model output
When people say an AI system is 'biased', they're usually describing one of several distinct phenomena that have different causes and require different fixes. Understanding the taxonomy is the first step to addressing the problem.
The world as it is reflects historical inequalities. A model trained on past hiring decisions will learn that certain demographics were hired less frequently — and replicate that pattern. The data is an accurate reflection of the world; the world is the problem.
The training dataset doesn't represent the population the model will be applied to. Facial recognition trained primarily on lighter-skinned faces performs poorly on darker-skinned faces. The data doesn't cover the full population.
The proxy variable used as a label doesn't equally measure the true outcome across groups. Using arrest records as a proxy for 'criminality' embeds racial disparities in policing into the label itself.
A single model is applied to a population it was trained on as a whole, when the relationship differs across subgroups. A diabetes risk model trained on general population data may perform poorly for specific ethnic groups whose physiology or lifestyle differs.
COMPAS (Correctional Offender Management Profiling for Alternative Sanctions) was a commercial tool used by US courts to predict recidivism risk and inform sentencing decisions. In 2016, ProPublica published an analysis showing: - Black defendants were nearly twice as likely as white defendants to be falsely flagged as future criminals - White defendants were more likely to be incorrectly flagged as low risk The company (Northpointe) responded that COMPAS was calibrated: predictions meant the same probability across races. Both claims were mathematically correct — but you cannot satisfy both fairness criteria simultaneously when base rates differ across groups. This case is the defining illustration of the impossibility theorem for fairness: you cannot have equal false positive rates, equal false negative rates, and calibration across groups with different base rates.