Move beyond manual tuning with systematic, efficient hyperparameter optimization
Most practitioners waste time running hundreds of random trials before finding a good configuration. A better approach is to work up the tuning ladder: start cheap, get directional signal, then invest compute on the most promising region of the search space.
- **Grid search**: Exhaustive but exponentially expensive. Only for 1-2 hyperparameters with few options - **Random search**: 60-70% of grid search quality at 10% of the cost. Good default starting point - **Bayesian optimization**: Builds a probabilistic model of the objective and samples intelligently. 3-5x more efficient than random for expensive training runs - **Population-based training (PBT)**: Evolves multiple training runs in parallel. State-of-the-art for very long training jobs
Optuna is the best library for Bayesian hyperparameter optimization. It integrates with PyTorch, sklearn, XGBoost, and more.
import optuna
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import cross_val_score
def objective(trial):
params = {
"n_estimators": trial.suggest_int("n_estimators", 50, 500),
"learning_rate": trial.suggest_float("learning_rate", 1e-4, 0.3, log=True),
"max_depth": trial.suggest_int("max_depth", 3, 10),
"subsample": trial.suggest_float("subsample", 0.5, 1.0),
"min_samples_split": trial.suggest_int("min_samples_split", 2, 20),
}
clf = GradientBoostingClassifier(**params, random_state=42)
score = cross_val_score(clf, X_train, y_train, cv=3, scoring="roc_auc").mean()
return score
study = optuna.create_study(direction="maximize")
study.optimize(objective, n_trials=50, n_jobs=-1)
print(f"Best AUC: {study.best_value:.4f}")
print(f"Best params: {study.best_params}")[I 2024-01-15 10:23:41] Trial 0 finished with value: 0.8834
[I 2024-01-15 10:23:42] Trial 1 finished with value: 0.8912
...
Best AUC: 0.9241
Best params: {'n_estimators': 287, 'learning_rate': 0.0412, 'max_depth': 6, ...}