Automate the testing, validation, and deployment of ML models with GitHub Actions
Traditional software CI/CD checks code correctness: linting, unit tests, integration tests. ML CI/CD adds a third dimension: model quality. A code change can be syntactically correct and pass all unit tests while still degrading model accuracy by 5%. ML CI/CD pipelines need to check all three: code quality, pipeline integrity, and model performance against a held-out evaluation set.
- Lint and type-check all Python code - Run unit tests for data transformations - Run data validation on a sample of training data - Train a model on a small training subset (smoke test) - Evaluate model against the golden evaluation set - Compare metrics to the current production model (regression check) - If metrics regress beyond threshold, block the PR
name: ML CI Pipeline
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v4
with:
python-version: "3.11"
- run: pip install -r requirements.txt
- name: Lint
run: ruff check . && mypy src/
- name: Unit tests
run: pytest tests/unit/ -v
- name: Data validation
run: python scripts/validate_data.py --sample-size 1000
- name: Smoke train
run: python scripts/train.py --max-samples 500 --epochs 1
- name: Evaluate
run: python scripts/evaluate.py --threshold 0.85
# Fails if AUC < 0.85 (exit code 1)