⏱️ 60 min

Advanced Training Tricks

Practical techniques to make training faster, more stable, and more reproducible

The High-Impact Tricks

These techniques are applied by default in most serious ML projects. They're not exotic research ideas — they're engineering discipline for reliable training.

Mixed Precision Training (AMP)

Store model activations in float16 (half precision) but compute gradient updates in float32. This roughly doubles GPU throughput and halves memory usage with no accuracy loss. Enable with two lines in PyTorch: `scaler = GradScaler()` and `with autocast():`

Gradient Clipping

Cap the norm of gradients at a threshold (typically 1.0). Prevents gradient explosion in RNNs and transformers. Without this, a single bad batch can corrupt weeks of training. Add `torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)` before `optimizer.step()`.

Learning Rate Scheduling

The learning rate should change during training. Common schedules: - Cosine annealing: smoothly decays LR following a cosine curve — widely used for fine-tuning - Warmup + decay: ramp LR from 0 to peak over first 5% of steps, then decay — standard for training from scratch - ReduceLROnPlateau: reduce LR when validation metric stops improving — useful for unknown training dynamics

Mixed Precision + Gradient Clipping in PyTorch

python
from torch.cuda.amp import GradScaler, autocast
from torch.optim.lr_scheduler import CosineAnnealingLR

scaler = GradScaler()
scheduler = CosineAnnealingLR(optimizer, T_max=num_epochs)

for epoch in range(num_epochs):
    for batch in train_loader:
        optimizer.zero_grad()

        # Mixed precision forward pass
        with autocast():
            outputs = model(batch["input"])
            loss = criterion(outputs, batch["target"])

        # Scaled backward pass
        scaler.scale(loss).backward()

        # Unscale before clipping
        scaler.unscale_(optimizer)
        torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)

        # Update weights
        scaler.step(optimizer)
        scaler.update()

    scheduler.step()
Sharan Initiatives — AI, Finance, Photography & More