Build real-time dashboards and alerts to keep your ML system healthy
ML systems need monitoring at three levels. Most teams only monitor infrastructure (CPU, memory). The more important signals are model-specific.
- Prediction latency (p50, p95, p99) - Request throughput and error rates - GPU/CPU utilization and memory - Queue depth (for async serving)
- Prediction distribution: Are predictions shifting over time? A model that always predicts 'not fraud' has 99.5% accuracy but zero utility. - Feature distribution: Which input features are drifting? - Business KPI: Actual business impact (click-through rate, fraud caught, etc.) - Confidence distribution: Is the model becoming less certain over time?
from prometheus_client import Counter, Histogram, Gauge, make_asgi_app
from fastapi import FastAPI
import time
# Metrics
PREDICTIONS = Counter("model_predictions_total", "Total predictions", ["label"])
LATENCY = Histogram("prediction_latency_seconds", "Prediction latency")
CONFIDENCE = Histogram("prediction_confidence", "Model confidence", buckets=[0.5, 0.7, 0.8, 0.9, 0.95, 1.0])
app = FastAPI()
app.mount("/metrics", make_asgi_app())
@app.post("/predict")
async def predict(request: PredictionRequest):
start = time.time()
prediction, confidence = model.predict(request.features)
# Record metrics
LATENCY.observe(time.time() - start)
PREDICTIONS.labels(label=str(prediction)).inc()
CONFIDENCE.observe(confidence)
return {"prediction": prediction, "confidence": confidence}