- It can code directly for small, self-contained tasks.
- It can spawn SubTurtles for larger autonomous work.
- It supervises progress and reports only meaningful milestones.
Two-Layer Architecture
Layer 1: Meta Agent
Conversational control plane. Handles user intent, task framing, delegation decisions, and milestone messaging.
Layer 2: SubTurtles
Autonomous background workers that execute scoped coding tasks from state files in
.subturtles/<name>/.claude-meta appends META_SHARED.md as system prompt, while the Telegram bot loads the same file into META_PROMPT, so both interfaces follow one shared policy.What the Meta Agent Owns
Core responsibilities:- Interpret user requests in plain language.
- Choose whether to implement directly or delegate.
- Seed SubTurtle
CLAUDE.mdstate files. - Spawn workers with
ctl spawnand monitor progress. - Keep updates milestone-focused (not process noise).
Direct Work vs Delegation
META_SHARED.md defines the decision model:
Do it directly for short tasks
Use direct edits for quick, contained changes where spawning overhead would be slower than implementation.
Delegate for multi-step work
Spawn SubTurtles when work is multi-file, iterative, or benefits from autonomous test/fix loops.
Keep state explicit
Before spawn, write
.subturtles/<name>/CLAUDE.md with Current Task, End Goal, and Backlog.Interface Surfaces
The Meta Agent is exposed primarily through Telegram bot handlers:- Direction changes (“build X”, “stop”, “continue”)
- Progress checks (“how’s it going?”)
- Voice-first usage when typing is inconvenient
Voice Mode and Transcription Handling
Voice support is built into the Telegram path (handlers/voice.ts and handlers/audio.ts):
- Authorize user and rate-limit request.
- Download voice/audio file from Telegram.
- Transcribe with
transcribeVoice(...). - Show transcript preview in chat.
- Route transcript to the active driver.
OPENAI_API_KEYenables transcription (TRANSCRIPTION_AVAILABLE).TRANSCRIPTION_PROMPTis built from a base prompt plus optional context file.- Stop intent detected from transcript can interrupt active work immediately.
Voice input is treated as first-class user intent. In-progress background runs can be preempted or queued so urgent spoken commands are not ignored.
