⏱️ 60 min

Fairness Metrics & Trade-offs

Measure model fairness mathematically — and understand why you cannot have all metrics at once

The Four Core Fairness Definitions

There are dozens of proposed fairness metrics, but four are most commonly used in practice. Understanding what each measures — and what it doesn't — is essential.

Demographic parity (statistical parity)

P(Ŷ=1 | A=0) = P(Ŷ=1 | A=1) The positive prediction rate should be equal across groups A (e.g., race, gender). Easy to measure. Doesn't account for whether underlying rates differ. Appropriate when you believe qualification rates should be equal across groups.

Equalized odds

P(Ŷ=1 | Y=1, A=0) = P(Ŷ=1 | Y=1, A=1) AND P(Ŷ=1 | Y=0, A=0) = P(Ŷ=1 | Y=0, A=1) Both true positive rates AND false positive rates should be equal across groups. Stronger than demographic parity. The COMPAS standard ProPublica was applying.

Calibration

P(Y=1 | Ŷ=p, A=0) = P(Y=1 | Ŷ=p, A=1) = p A prediction of 70% risk should mean 70% probability of the outcome, regardless of group membership. The standard Northpointe was applying.

Individual fairness

Similar individuals should receive similar predictions. Requires a similarity metric over individuals — which is itself contentious.

Measuring Fairness with Fairlearn

python
from fairlearn.metrics import MetricFrame, demographic_parity_difference, equalized_odds_difference
from sklearn.metrics import accuracy_score, precision_score

# Evaluate model by demographic group
mf = MetricFrame(
    metrics={"accuracy": accuracy_score, "precision": precision_score},
    y_true=y_test,
    y_pred=y_pred,
    sensitive_features=sensitive_features,  # e.g., gender column
)

print("By group:")
print(mf.by_group)
print()
print(f"Overall accuracy: {mf.overall['accuracy']:.3f}")
print(f"Accuracy difference across groups: {mf.difference()['accuracy']:.3f}")

# Specific fairness metrics
dpd = demographic_parity_difference(y_test, y_pred, sensitive_features=sensitive_features)
eod = equalized_odds_difference(y_test, y_pred, sensitive_features=sensitive_features)
print(f"Demographic parity difference: {dpd:.3f}  (0 = perfectly fair)")
print(f"Equalized odds difference: {eod:.3f}     (0 = perfectly fair)")
Sharan Initiatives — AI, Finance, Photography & More