Implement a correct batched beam search decoder for autoregressive
generation in pure NumPy.

Simulate a minimal language model:
- vocab_size = 1000
- d_model = 64
- Use random embeddings + 1 transformer block with random weights
  (correctness depends on beam search logic, not model quality)

Requirements:

1. MULTI-BATCH SUPPORT:
   - Accept prompt_token_ids: list[list[int]] — one prompt per batch item
   - beam_width K per batch item
   - Each batch item's beams are INDEPENDENT (no cross-contamination between
     different prompts)

2. PER-STEP BEAM EXPANSION:
   - For each UNFINISHED beam, compute logits for the next token
   - Take top-(2*K) candidates per beam (not just top-K, to preserve diversity)
   - Compute total logprob = accumulated_logprob + new_logprob
   - Pool all candidates across all beams, sort globally by total logprob
     (most negative = worst), take top K
   - These K become the active beams for the next step

3. LENGTH PENALTY (for ranking only, not for accumulated score):
   - adjusted_score = accumulated_logprob / (generated_length ^ alpha)
   - alpha is a hyperparameter (default 0.6)
   - The accumulated logprob is NEVER modified by length penalty — it stays
     as the raw sum of logprobs
   - Length penalty is used ONLY when comparing beams for ranking/selection
   - generated_length = number of NEW tokens generated (NOT including prompt
     tokens — the prompt does not count toward length penalty)

4. EOS HANDLING (the critical part — get this right):
   - When a beam produces token_id == eos_token:
     * Mark that beam as FINISHED
     * Freeze its accumulated_logprob and generated_length
     * The beam STAYS in the pool — it competes with unfinished beams
   - At each step, the top-K selection pool includes BOTH:
     (a) all FINISHED beams, and (b) all candidates from expanding UNFINISHED beams
   - If all K beams in a batch item are finished, that item stops expanding
   - If all batch items have K finished beams, stop early
   - IMPORTANT: Do NOT remove finished beams from the pool. They must compete
     against unfinished beams using their length-penalized scores. If you
     remove them, a short, high-confidence sequence that hit EOS early will
     be wrongly discarded in favor of a longer, lower-confidence sequence.

5. RETURN:
   - For each batch item: a list of K sequences (generated token IDs only,
     NOT including prompt tokens), sorted by length-penalized score
     descending (best/highest score first)
   - Each sequence's score = accumulated_logprob / (len(seq) ^ alpha)

6. EDGE CASES:
   - If max_new_tokens is reached before K beams finish, return the best K
     (finished + unfinished) by length-penalized score
   - A batch item may end with fewer than K finished beams (if max_new_tokens
     hit). Return whatever is available.
   - Log-space accumulation: keep everything in log space; avoid unnecessary
     exp/log conversions. Don't let very negative numbers cause underflow.

Deliver:
- A class or function `batched_beam_search(prompts, beam_width, max_new_tokens,
  alpha, eos_token_id)` that returns the K best sequences per batch item
- Test 1: Single batch item, K=1, short prompt, alpha=0
  → verify this behaves identically to greedy decoding (always pick argmax)
- Test 2: batch=2, beam_width=3, different prompt lengths [3, 5], alpha=0.6
  → verify per-batch independence: beams from prompt 0 never interact with
    beams from prompt 1
- Test 3: THE EOS RETENTION TEST. Monkey-patch or modify the model's forward
  pass so that at step 1, one beam produces EOS with total logprob=-3.0
  while another beam continues with logprob=-4.0. At step 2, the continuing
  beam has logprob=-5.0. Verify that the EOS beam with score=-3.0 is
  correctly returned as the winner (even though it stopped early). If you
  had removed EOS beams from the pool, the unfinished beam with score=-5.0
  would wrongly win. This test distinguishes correct from buggy
  implementations.
- Comments explaining why finished beams must NOT be removed from the pool

Use only NumPy. No PyTorch, JAX, TensorFlow, or autograd.