Retrain Stage 1 sweep with all bug fixes applied #2

Open
opened 2026-05-01 14:01:26 +02:00 by sleepy · 0 comments
sleepy commented 2026-05-01 14:01:26 +02:00 (Migrated from localhost:18431)

Goal

Run the full Stage 1 hyperparameter sweep using the fixed modules.

Config (from original sweep)

  • d_model=128, n_layers=4
  • 2M tokens, seq_len=256, batch=64
  • Grid: 6 LR (0.001-0.032) x 2 weight_decay (0/0.1) = 12 trials
  • Checkpoint dir: /mnt/e/ternary-checkpoints

Original Results (broken)

All 12 trials produced identical flat loss ~22.3 — model was worse than random.

Expected (fixed)

  • Loss should start ~10.4 and decrease during training
  • Different LR/weight_decay configs should show varied convergence
  • Best config should reach loss < 8 within 2M tokens

Command

cd /home/sleepy/ternary && CUDA_VISIBLE_DEVICES=0 python fixes/train_fixed.py
# Then run sweep when individual training is verified

Status

  • Individual training run verified
  • Full sweep completed with fixed modules
## Goal Run the full Stage 1 hyperparameter sweep using the fixed modules. ## Config (from original sweep) - d_model=128, n_layers=4 - 2M tokens, seq_len=256, batch=64 - Grid: 6 LR (0.001-0.032) x 2 weight_decay (0/0.1) = 12 trials - Checkpoint dir: /mnt/e/ternary-checkpoints ## Original Results (broken) All 12 trials produced identical flat loss ~22.3 — model was worse than random. ## Expected (fixed) - Loss should start ~10.4 and decrease during training - Different LR/weight_decay configs should show varied convergence - Best config should reach loss < 8 within 2M tokens ## Command ```bash cd /home/sleepy/ternary && CUDA_VISIBLE_DEVICES=0 python fixes/train_fixed.py # Then run sweep when individual training is verified ``` ## Status - [ ] Individual training run verified - [ ] Full sweep completed with fixed modules
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
sleepy/ternary#2
No description provided.