2. Make changes, commit with `[area] description` conventions (see below)
2. Make changes, commit with `[area] description` conventions (see below)
3. Push branch: `git push gitea feature/<name>`
3. Push branch: `git push gitea feature/<name>`
4. Create PR on Gitea targeting `master`
4. Create PR on Gitea targeting `master`
5. Before merge: build, benchmark (record in BENCHMARKS.md), perplexity check if kernel changed
5. Before merge: build, benchmark (record in BENCHMARKS.md), perplexity check if kernel changed, **coherence test** (see below)
6. Squash-merge to master
6. Squash-merge to master
## Pre-Merge Coherence Tests
**Mandatory before any merge to master.** Run on both `master` and the PR branch to detect silent correctness regressions.
> **IMPORTANT:** macOS has no `timeout` command. Use `gtimeout` (from `brew install coreutils`).
### Quick Coherence Test (4B model, ~30s)
```bash
gtimeout 60 ./build-build/bin/llama-cli \
-m ~/.llama/models/Qwen3.5-4B-Q4_0.gguf \
-n 64 -p "Once upon a time"\
--temp 0 -s 42 -st
```
### Perplexity Check (kernel changes only)
```bash
gtimeout 120 ./build-build/bin/llama-perplexity \
-m ~/.llama/models/Qwen3.5-4B-Q4_0.gguf \
-f /tmp/coherence_test.txt -t 1 --chunks 1 -c 128
```
### Verification
- **llama-cli**: PR output must be coherent speech (not gibberish). Does not need to be bit-perfect vs master.
- **perplexity**: PR perplexity must match master within floating-point tolerance (<0.1% delta)
- **Gibberish output = block merge.** Re-dispatch with specific feedback.
### Timeout Policy
**All test commands MUST use `gtimeout`** to prevent hangs:
- Inference/cli: `gtimeout 60` (60s)
- Perplexity: `gtimeout 120` (120s)
- Benchmark: `gtimeout 300` (5min)
- Build: `gtimeout 300` (5min)
A hung test is a test failure. Do not retry without investigating the hang.
### IMPORTANT: llama-cli interactive mode
llama-cli enters interactive REPL after generating, flooding output with `>` prompts. This is NOT a correctness failure — it's the CLI waiting for input.
**Always use `--single-turn` (`-st`) flag to prevent this:**
```bash
gtimeout 60 ./build-build/bin/llama-cli -m ~/.llama/models/Qwen3.5-4B-Q4_0.gguf -n 64 -p "Once upon a time" --temp 0 -s 42 -st
```
Without `-st`, you will see `>` garbage and the process will hang. DO NOT attempt to "fix" the kernel because of this.
## Commit Messages
## Commit Messages
Format: `[area] short description (max 72 chars)`
Format: `[area] short description (max 72 chars)`
@@ -77,15 +127,23 @@ When working autonomously, agents MUST:
2.**Create a branch** for any code change: `feature/<issue-number>-<short-desc>`
2.**Create a branch** for any code change: `feature/<issue-number>-<short-desc>`
3.**Reference the issue** in commits: `[area] description (#123)`
3.**Reference the issue** in commits: `[area] description (#123)`
4.**Run benchmarks** before/after kernel changes and record in BENCHMARKS.md
4.**Run benchmarks** before/after kernel changes and record in BENCHMARKS.md
5.**Run perplexity** to verify correctness after any kernel change:
5.**Run perplexity** to verify correctness after any kernel change (with timeout):
@@ -3118,19 +3118,15 @@ int ggml_metal_op_bin(ggml_metal_op_t ctx, int idx) {
intn_fuse=1;
intn_fuse=1;
// c[0] = add(a, b[0])
// c[0] = op(a, b[0])
// c[1] = add(c[0], b[1])
// c[1] = op(c[0], b[1])
// c[2] = add(c[1], b[2])
// c[2] = op(c[1], b[2])
// ...
// ...
if(use_fusion){
if(use_fusion){
fops[0]=GGML_OP_ADD;
ggml_opcur_op=op->op;
fops[1]=GGML_OP_ADD;
for(inti=0;i<8;++i){
fops[2]=GGML_OP_ADD;
fops[i]=cur_op;
fops[3]=GGML_OP_ADD;
}
fops[4]=GGML_OP_ADD;
fops[5]=GGML_OP_ADD;
fops[6]=GGML_OP_ADD;
fops[7]=GGML_OP_ADD;
// note: in metal, we sometimes encode the graph in parallel so we have to avoid fusing ops
// note: in metal, we sometimes encode the graph in parallel so we have to avoid fusing ops
// across splits. idx_end indicates the last node in the current split
// across splits. idx_end indicates the last node in the current split
@@ -3165,7 +3161,7 @@ int ggml_metal_op_bin(ggml_metal_op_t ctx, int idx) {
++n_fuse;
++n_fuse;
if(debug_fusion>1&&n_fuse>1){
if(debug_fusion>1&&n_fuse>1){
GGML_LOG_DEBUG("%s: fuse: ADD x %d\n",__func__,n_fuse);
GGML_LOG_DEBUG("%s: fuse: %s x %d\n",__func__,ggml_op_name(cur_op),n_fuse);
}
}
}
}
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.