Files

14 KiB

pi-mono Repository Feedback Analysis

Date: April 9, 2026
Focus: Local model compatibility (Llama 3.1 8B, Mistral, Qwen 2.5)
Method: Codebase review + cross-reference with community feedback


Executive Summary

pi-mono is well-suited for local models overall, with a minimal system prompt design that aligns well with smaller models' constraints. However, several areas need attention for reliable local model operation, particularly around JSON parsing, tool calling, and context management.


What Works Well for Local Models

1. Minimal System Prompt Design STRONG

Evidence: repo/packages/coding-agent/src/core/system-prompt.ts

The system prompt builder creates concise prompts (~1000 tokens) that work well with local models:

// Lines 127-143: Base prompt structure
let prompt = `You are an expert coding assistant operating inside pi, a coding agent harness. You help users by reading files, executing commands, editing code, and writing new files.

Available tools:
${toolsList}

In addition to the tools above, you may have access to other custom tools depending on the project.

Guidelines:
${guidelines}

Why this works:

  • Under 1000 tokens total (confirmed by local-llm-feedback.md line 38)
  • Clear, direct language without excessive verbosity
  • Structured sections (tools, guidelines) that are easy to parse
  • Date and cwd appended at the end (lines 164-165)

Confidence: Strong - directly confirmed by community feedback


2. Skills System with XML Format STRONG

Evidence: repo/packages/coding-agent/src/core/skills.ts (lines 339-365)

Skills use XML format per Agent Skills standard:

export function formatSkillsForPrompt(skills: Skill[]): string {
  // ...
  const lines = [
    "\n\nThe following skills provide specialized instructions for specific tasks.",
    "Use the read tool to load a skill's file when the task matches its description.",
    "",
    "<available_skills>",
  ];

  for (const skill of visibleSkills) {
    lines.push("  <skill>");
    lines.push(`    <name>${escapeXml(skill.name)}</name>`);
    lines.push(`    <description>${escapeXml(skill.description)}</description>`);
    lines.push(`    <location>${escapeXml(skill.filePath)}</location>`);
    lines.push("  </skill>");
  }

  lines.push("</available_skills>");
  return lines.join("\n");
}

Why this works:

  • XML structure is more parseable than free-form text
  • Clear delimiters help models identify skill boundaries
  • On-demand loading (line 137 in local-llm-feedback.md) prevents context bloat
  • disableModelInvocation flag allows explicit invocation without prompt bloat

Confidence: Strong - confirmed by community feedback on skills system


3. Tool Descriptions Are Clear and Actionable STRONG

Evidence: repo/packages/coding-agent/src/core/tools/read.ts (line 123), bash.ts (line 272)

Tool descriptions are concise and include actionable details:

// read.ts line 123
description: `Read the contents of a file. Supports text files and images (jpg, png, gif, webp). Images are sent as attachments. For text files, output is truncated to ${DEFAULT_MAX_LINES} lines or ${DEFAULT_MAX_BYTES / 1024}KB (whichever is hit first). Use offset/limit for large files. When you need the full file, continue with offset until complete.`,

// bash.ts line 272
description: `Execute a bash command in the current working directory. Returns stdout and stderr. Output is truncated to last ${DEFAULT_MAX_LINES} lines or ${DEFAULT_MAX_BYTES / 1024}KB (whichever is hit first). If truncated, full output is saved to a temp file. Optionally provide a timeout in seconds.`,

Why this works:

  • Explicit truncation limits help models understand constraints
  • Continuation instructions (offset, timeout) are clear
  • No ambiguous jargon
  • Practical examples embedded in descriptions

Confidence: Strong - confirmed by community feedback on tool calling


4. Schema Definitions Use TypeBox MODERATE

Evidence: repo/packages/coding-agent/src/core/tools/read.ts (lines 17-21)

const readSchema = Type.Object({
  path: Type.String({ description: "Path to the file to read (relative or absolute)" }),
  offset: Type.Optional(Type.Number({ description: "Line number to start reading from (1-indexed)" })),
  limit: Type.Optional(Type.Number({ description: "Maximum number of lines to read" })),
});

Why this helps:

  • Schema is generated from TypeBox, ensuring consistency
  • Descriptions are embedded in schema, not separate
  • Optional fields are clearly marked

Caveat: Local models may still struggle with JSON schema compliance (see Issues section)

Confidence: Moderate - schema design is good, but JSON compliance is a separate issue


Areas Needing Improvement for Local Models

1. JSON Compliance 🟡 MAJOR ISSUE

Evidence: local-llm-feedback.md line 50-54

JSON Compliance (Major)

  • Description: Local models often produce malformed JSON initially
  • Impact: Requires retry mechanisms for tool calling
  • Retry Rate: ~1.6 retries per prompt

Code Analysis: The tool calling system relies on JSON parsing (implicit in @mariozechner/pi-agent-core), but local models struggle with:

  • Strict JSON syntax
  • Escaping special characters
  • Proper nesting of tool arguments

Recommendation:

  1. Implement JSON extraction before parsing (extract JSON block from text)
  2. Add retry loops with prompt refinement
  3. Consider schema relaxation for local models (e.g., allow unquoted keys)

Confidence: Strong - confirmed by community feedback


2. Context Reading Limitations 🟡 MINOR ISSUE

Evidence: local-llm-feedback.md line 55-58

Context Reading (Minor)

  • Description: Models trained to read partial files may miss important context
  • Impact: Potential for incomplete understanding of large files

Code Analysis: The read.ts tool (lines 189-236) implements truncation with offset/limit:

const allLines = textContent.split("\n");
const totalFileLines = allLines.length;
const startLine = offset ? Math.max(0, offset - 1) : 0;
// ... truncation logic

Issue:

  • Models may not understand they need to request multiple reads
  • Truncation hints (lines 222-224) are helpful but not always followed

Recommendation:

  1. Add explicit continuation prompts in truncation messages
  2. Consider summarization for very large files before reading
  3. Train models on multi-turn file reading patterns

Confidence: Moderate - issue is real but less critical than JSON compliance


3. Session Hangs 🟠 CRITICAL

Evidence: local-llm-feedback.md line 63-67

Session Hangs (Critical)

  • Description: After extended use, pi-coding-agent may stop responding
  • Issue: #2422
  • Impact: Requires session restart

Code Analysis: The agent-session.ts file (3059 lines) manages session state, but there's no explicit heartbeat or liveness check:

  • No timeout on tool execution (bash.ts line 29)
  • No session health monitoring
  • No automatic recovery from hung states

Recommendation:

  1. Add session health checks (periodic ping)
  2. Implement timeout recovery for stuck sessions
  3. Add graceful shutdown handlers

Confidence: Moderate - confirmed by GitHub issue #2422


4. Prompt Template Substitution 🟡 MINOR

Evidence: repo/packages/coding-agent/src/core/prompt-templates.ts (lines 67-101)

The prompt template system supports $1, $2, $@, $ARGUMENTS substitution:

export function substituteArgs(content: string, args: string[]): string {
  let result = content;
  
  // Replace $1, $2, etc. with positional args FIRST
  result = result.replace(/\$(\d+)/g, (_, num) => {
    const index = parseInt(num, 10) - 1;
    return args[index] ?? "";
  });
  
  // Replace $ARGUMENTS with all args joined
  result = result.replace(/\$ARGUMENTS/g, allArgs);
  // ...
}

Issue:

  • Local models may not understand template syntax
  • Complex substitutions may confuse smaller models

Recommendation:

  1. Document template syntax clearly in system prompt
  2. Consider simpler syntax for local model modes
  3. Add template examples in prompt snippets

Confidence: Weak - this is more of a usability issue than a functional one


Conjunctions (My Analysis + Feedback)

1. Tool Calling Reliability 🟡

Feedback: local-llm-feedback.md line 59-62

Tool Calling Reliability (Major)

  • Description: Less reliable tool calling compared to frontier models
  • Impact: More retries needed, occasional failures

My Analysis: The tool calling system uses @mariozechner/pi-agent-core which wraps tool definitions. The issue isn't the tool definitions themselves (which are well-structured), but rather:

  1. Model's ability to parse tool schemas - local models struggle with nested JSON
  2. Tool name matching - models may hallucinate tool names
  3. Argument structure - models may omit required fields or add extra ones

Recommendation:

  1. Add tool name validation before calling
  2. Implement argument defaults for optional fields
  3. Consider tool fallback (e.g., bash for simple file ops)

Confidence: Moderate - combines feedback with code analysis


2. Context Management Strategy 🟡

Feedback: local-llm-feedback.md line 146-149

Context Management

  • Compaction: Auto-summarization near context limits essential
  • Topic-Based: Topic-based compaction extensions recommended
  • Code-Aware: Code-aware summaries improve code context retention

Code Analysis: The compaction/compaction.ts file (823 lines) implements sophisticated compaction:

const SUMMARIZATION_PROMPT = `The messages above are a conversation to summarize. Create a structured context checkpoint summary that another LLM will use to continue the work.

Use this EXACT format:

## Goal
[What is the user trying to accomplish?...]

## Constraints & Preferences
- [Any constraints, preferences, or requirements...]

My Analysis: The compaction system is well-designed but may be too complex for local models:

  • Structured format is good, but may confuse smaller models
  • File operation tracking (lines 33-69) is sophisticated but may not be needed for all tasks
  • Token estimation (lines 232-290) uses conservative heuristics

Recommendation:

  1. Add simplified compaction mode for local models
  2. Test compaction prompts with local models
  3. Consider lighter summaries (fewer sections) for 8B models

Confidence: Moderate - compaction is well-designed but may need tuning


What's Good vs. What's Bad

GOOD for Local Models

  1. Minimal system prompt (~1000 tokens)
  2. XML-formatted skills with clear delimiters
  3. Clear tool descriptions with truncation hints
  4. Type-based schema definitions (consistent structure)
  5. On-demand skill loading (prevents context bloat)
  6. Sophisticated compaction with structured summaries
  7. File operation tracking for context awareness
  8. Prompt template substitution with multiple syntax options

🟡 NEEDS IMPROVEMENT

  1. JSON parsing - requires retry mechanisms
  2. Tool calling reliability - needs validation and fallbacks
  3. Session stability - needs health checks and recovery
  4. Context reading - needs better continuation hints
  5. Compaction complexity - may need simplified mode

🟠 POTENTIAL ISSUES

  1. Prompt template syntax - may confuse smaller models
  2. Schema nesting - deeply nested schemas may be hard to parse
  3. Tool name hallucination - needs validation layer

Recommendations for Local Model Optimization

High Priority

  1. Add JSON extraction layer

    • Extract JSON block from model output before parsing
    • Fallback to regex-based extraction for simple cases
  2. Implement retry loops with prompt refinement

    • On JSON parse failure, retry with "Please output valid JSON"
    • Track retry count and escalate if needed
  3. Add session health monitoring

    • Periodic ping to detect hung sessions
    • Graceful shutdown with state preservation

Medium Priority

  1. Simplify compaction for 8B models

    • Reduce summary sections from 7 to 4
    • Use simpler language in prompts
  2. Add tool name validation

    • Validate tool names against registered tools
    • Fallback to bash for unknown tools
  3. Improve continuation hints

    • Make truncation messages more explicit
    • Add "Type /continue to read more" prompts

Low Priority

  1. Document prompt template syntax

    • Add examples in system prompt
    • Create template reference card
  2. Add schema relaxation mode

    • Allow unquoted keys for local models
    • Relax strict JSON requirements

Benchmark Expectations for Local Models

Based on feedback and code analysis:

Metric Expected for Local Models Notes
Success Rate 60-70% on straightforward tasks Confirmed by feedback
Retry Rate 1.5-2.0 retries per prompt Confirmed by feedback
JSON Compliance ~60% on first try Inferred from retry rate
Tool Calling 70% on first try Lower than frontier models
Context Retention Good with compaction Compaction is well-designed

Source References

  1. Community Feedback: pi/feedback/localllm/local-llm-feedback.md
  2. GitHub Issue #2422: Session hang bug
  3. Reddit r/LocalLLaMA: Model comparisons and experiences
  4. Codebase: repo/packages/coding-agent/src/core/ (system-prompt, skills, tools, compaction)

Conclusion

pi-mono is well-suited for local models with its minimal system prompt, clear tool definitions, and sophisticated compaction system. The main challenges are JSON compliance and tool calling reliability, which are common issues for smaller models.

Strongest features for local models:

  • Minimal system prompt design
  • XML-formatted skills
  • Clear tool descriptions
  • Sophisticated compaction

Areas needing attention:

  • JSON parsing with retry mechanisms
  • Session stability monitoring
  • Simplified compaction mode for 8B models

Overall assessment: pi-mono is a good choice for local models with minor adjustments needed for optimal performance.