first commit

2026-02-25 23:49:54 -05:00
commit 4d097161cb
1775 changed files with 452827 additions and 0 deletions
--- a/.taskmaster/CLAUDE.md
+++ b/.taskmaster/CLAUDE.md
@@ -0,0 +1,435 @@
+# Task Master AI - Agent Integration Guide
+
+## Essential Commands
+
+### Core Workflow Commands
+
+```bash
+# Project Setup
+task-master init                                    # Initialize Task Master in current project
+task-master parse-prd .taskmaster/docs/prd.md       # Generate tasks from PRD document
+task-master models --setup                        # Configure AI models interactively
+
+# Daily Development Workflow
+task-master list                                   # Show all tasks with status
+task-master next                                   # Get next available task to work on
+task-master show <id>                             # View detailed task information (e.g., task-master show 1.2)
+task-master set-status --id=<id> --status=done    # Mark task complete
+
+# Task Management
+task-master add-task --prompt="description" --research        # Add new task with AI assistance
+task-master expand --id=<id> --research --force              # Break task into subtasks
+task-master update-task --id=<id> --prompt="changes"         # Update specific task
+task-master update --from=<id> --prompt="changes"            # Update multiple tasks from ID onwards
+task-master update-subtask --id=<id> --prompt="notes"        # Add implementation notes to subtask
+
+# Analysis & Planning
+task-master analyze-complexity --research          # Analyze task complexity
+task-master complexity-report                      # View complexity analysis
+task-master expand --all --research               # Expand all eligible tasks
+
+# Dependencies & Organization
+task-master add-dependency --id=<id> --depends-on=<id>       # Add task dependency
+task-master move --from=<id> --to=<id>                       # Reorganize task hierarchy
+task-master validate-dependencies                            # Check for dependency issues
+task-master generate                                         # Update task markdown files (usually auto-called)
+```
+
+## Key Files & Project Structure
+
+### Core Files
+
+- `.taskmaster/tasks/tasks.json` - Main task data file (auto-managed)
+- `.taskmaster/config.json` - AI model configuration (use `task-master models` to modify)
+- `.taskmaster/docs/prd.md` - Product Requirements Document for parsing (`.md` extension recommended for better editor support)
+- `.taskmaster/tasks/*.txt` - Individual task files (auto-generated from tasks.json)
+- `.env` - API keys for CLI usage
+
+**PRD File Format:** While both `.txt` and `.md` extensions work, **`.md` is recommended** because:
+- Markdown syntax highlighting in editors improves readability
+- Proper rendering when previewing in VS Code, GitHub, or other tools
+- Better collaboration through formatted documentation
+
+### Claude Code Integration Files
+
+- `CLAUDE.md` - Auto-loaded context for Claude Code (this file)
+- `.claude/settings.json` - Claude Code tool allowlist and preferences
+- `.claude/commands/` - Custom slash commands for repeated workflows
+- `.mcp.json` - MCP server configuration (project-specific)
+
+### Directory Structure
+
+```
+project/
+├── .taskmaster/
+│   ├── tasks/              # Task files directory
+│   │   ├── tasks.json      # Main task database
+│   │   ├── task-1.md      # Individual task files
+│   │   └── task-2.md
+│   ├── docs/              # Documentation directory
+│   │   ├── prd.md         # Product requirements (.md recommended)
+│   ├── reports/           # Analysis reports directory
+│   │   └── task-complexity-report.json
+│   ├── templates/         # Template files
+│   │   └── example_prd.md  # Example PRD template (.md recommended)
+│   └── config.json        # AI models & settings
+├── .claude/
+│   ├── settings.json      # Claude Code configuration
+│   └── commands/         # Custom slash commands
+├── .env                  # API keys
+├── .mcp.json            # MCP configuration
+└── CLAUDE.md            # This file - auto-loaded by Claude Code
+```
+
+## MCP Integration
+
+Task Master provides an MCP server that Claude Code can connect to. Configure in `.mcp.json`:
+
+```json
+{
+  "mcpServers": {
+    "task-master-ai": {
+      "command": "npx",
+      "args": ["-y", "task-master-ai"],
+      "env": {
+        "TASK_MASTER_TOOLS": "core",
+        "ANTHROPIC_API_KEY": "your_key_here",
+        "PERPLEXITY_API_KEY": "your_key_here",
+        "OPENAI_API_KEY": "OPENAI_API_KEY_HERE",
+        "GOOGLE_API_KEY": "GOOGLE_API_KEY_HERE",
+        "XAI_API_KEY": "XAI_API_KEY_HERE",
+        "OPENROUTER_API_KEY": "OPENROUTER_API_KEY_HERE",
+        "MISTRAL_API_KEY": "MISTRAL_API_KEY_HERE",
+        "AZURE_OPENAI_API_KEY": "AZURE_OPENAI_API_KEY_HERE",
+        "OLLAMA_API_KEY": "OLLAMA_API_KEY_HERE"
+      }
+    }
+  }
+}
+```
+
+### MCP Tool Tiers
+
+Default: `core` (7 tools). Set via `TASK_MASTER_TOOLS` env var.
+
+| Tier | Count | Tools |
+|------|-------|-------|
+| `core` | 7 | `get_tasks`, `next_task`, `get_task`, `set_task_status`, `update_subtask`, `parse_prd`, `expand_task` |
+| `standard` | 14 | core + `initialize_project`, `analyze_project_complexity`, `expand_all`, `add_subtask`, `remove_task`, `add_task`, `complexity_report` |
+| `all` | 44+ | standard + dependencies, tags, research, autopilot, scoping, models, rules |
+
+**Upgrade when tool unavailable:** Edit MCP config, change `TASK_MASTER_TOOLS` from `"core"` to `"standard"` or `"all"`, restart MCP.
+
+### Essential MCP Tools
+
+```javascript
+help; // = shows available taskmaster commands
+// Project setup
+initialize_project; // = task-master init
+parse_prd; // = task-master parse-prd
+
+// Daily workflow
+get_tasks; // = task-master list
+next_task; // = task-master next
+get_task; // = task-master show <id>
+set_task_status; // = task-master set-status
+
+// Task management
+add_task; // = task-master add-task
+expand_task; // = task-master expand
+update_task; // = task-master update-task
+update_subtask; // = task-master update-subtask
+update; // = task-master update
+
+// Analysis
+analyze_project_complexity; // = task-master analyze-complexity
+complexity_report; // = task-master complexity-report
+```
+
+## Claude Code Workflow Integration
+
+### Standard Development Workflow
+
+#### 1. Project Initialization
+
+```bash
+# Initialize Task Master
+task-master init
+
+# Create or obtain PRD, then parse it (use .md extension for better editor support)
+task-master parse-prd .taskmaster/docs/prd.md
+
+# Analyze complexity and expand tasks
+task-master analyze-complexity --research
+task-master expand --all --research
+```
+
+If tasks already exist, another PRD can be parsed (with new information only!) using parse-prd with --append flag. This will add the generated tasks to the existing list of tasks..
+
+#### 2. Daily Development Loop
+
+```bash
+# Start each session
+task-master next                           # Find next available task
+task-master show <id>                     # Review task details
+
+# During implementation, check in code context into the tasks and subtasks
+task-master update-subtask --id=<id> --prompt="implementation notes..."
+
+# Complete tasks
+task-master set-status --id=<id> --status=done
+```
+
+#### 3. Multi-Claude Workflows
+
+For complex projects, use multiple Claude Code sessions:
+
+```bash
+# Terminal 1: Main implementation
+cd project && claude
+
+# Terminal 2: Testing and validation
+cd project-test-worktree && claude
+
+# Terminal 3: Documentation updates
+cd project-docs-worktree && claude
+```
+
+### Custom Slash Commands
+
+Create `.claude/commands/taskmaster-next.md`:
+
+```markdown
+Find the next available Task Master task and show its details.
+
+Steps:
+
+1. Run `task-master next` to get the next task
+2. If a task is available, run `task-master show <id>` for full details
+3. Provide a summary of what needs to be implemented
+4. Suggest the first implementation step
+```
+
+Create `.claude/commands/taskmaster-complete.md`:
+
+```markdown
+Complete a Task Master task: $ARGUMENTS
+
+Steps:
+
+1. Review the current task with `task-master show $ARGUMENTS`
+2. Verify all implementation is complete
+3. Run any tests related to this task
+4. Mark as complete: `task-master set-status --id=$ARGUMENTS --status=done`
+5. Show the next available task with `task-master next`
+```
+
+## Tool Allowlist Recommendations
+
+Add to `.claude/settings.json`:
+
+```json
+{
+  "allowedTools": [
+    "Edit",
+    "Bash(task-master *)",
+    "Bash(git commit:*)",
+    "Bash(git add:*)",
+    "Bash(npm run *)",
+    "mcp__task_master_ai__*"
+  ]
+}
+```
+
+## Configuration & Setup
+
+### API Keys Required
+
+At least **one** of these API keys must be configured:
+
+- `ANTHROPIC_API_KEY` (Claude models) - **Recommended**
+- `PERPLEXITY_API_KEY` (Research features) - **Highly recommended**
+- `OPENAI_API_KEY` (GPT models)
+- `GOOGLE_API_KEY` (Gemini models)
+- `MISTRAL_API_KEY` (Mistral models)
+- `OPENROUTER_API_KEY` (Multiple models)
+- `XAI_API_KEY` (Grok models)
+
+An API key is required for any provider used across any of the 3 roles defined in the `models` command.
+
+### Model Configuration
+
+```bash
+# Interactive setup (recommended)
+task-master models --setup
+
+# Set specific models
+task-master models --set-main claude-3-5-sonnet-20241022
+task-master models --set-research perplexity-llama-3.1-sonar-large-128k-online
+task-master models --set-fallback gpt-4o-mini
+```
+
+## Task Structure & IDs
+
+### Task ID Format
+
+- Main tasks: `1`, `2`, `3`, etc.
+- Subtasks: `1.1`, `1.2`, `2.1`, etc.
+- Sub-subtasks: `1.1.1`, `1.1.2`, etc.
+
+### Task Status Values
+
+- `pending` - Ready to work on
+- `in-progress` - Currently being worked on
+- `done` - Completed and verified
+- `deferred` - Postponed
+- `cancelled` - No longer needed
+- `blocked` - Waiting on external factors
+
+### Task Fields
+
+```json
+{
+  "id": "1.2",
+  "title": "Implement user authentication",
+  "description": "Set up JWT-based auth system",
+  "status": "pending",
+  "priority": "high",
+  "dependencies": ["1.1"],
+  "details": "Use bcrypt for hashing, JWT for tokens...",
+  "testStrategy": "Unit tests for auth functions, integration tests for login flow",
+  "subtasks": []
+}
+```
+
+## Claude Code Best Practices with Task Master
+
+### Context Management
+
+- Use `/clear` between different tasks to maintain focus
+- This CLAUDE.md file is automatically loaded for context
+- Use `task-master show <id>` to pull specific task context when needed
+
+### Iterative Implementation
+
+1. `task-master show <subtask-id>` - Understand requirements
+2. Explore codebase and plan implementation
+3. `task-master update-subtask --id=<id> --prompt="detailed plan"` - Log plan
+4. `task-master set-status --id=<id> --status=in-progress` - Start work
+5. Implement code following logged plan
+6. `task-master update-subtask --id=<id> --prompt="what worked/didn't work"` - Log progress
+7. `task-master set-status --id=<id> --status=done` - Complete task
+
+### Complex Workflows with Checklists
+
+For large migrations or multi-step processes:
+
+1. Create a markdown PRD file describing the new changes: `touch task-migration-checklist.md` (prds can be .txt or .md)
+2. Use Taskmaster to parse the new prd with `task-master parse-prd --append` (also available in MCP)
+3. Use Taskmaster to expand the newly generated tasks into subtasks. Consdier using `analyze-complexity` with the correct --to and --from IDs (the new ids) to identify the ideal subtask amounts for each task. Then expand them.
+4. Work through items systematically, checking them off as completed
+5. Use `task-master update-subtask` to log progress on each task/subtask and/or updating/researching them before/during implementation if getting stuck
+
+### Git Integration
+
+Task Master works well with `gh` CLI:
+
+```bash
+# Create PR for completed task
+gh pr create --title "Complete task 1.2: User authentication" --body "Implements JWT auth system as specified in task 1.2"
+
+# Reference task in commits
+git commit -m "feat: implement JWT auth (task 1.2)"
+```
+
+### Parallel Development with Git Worktrees
+
+```bash
+# Create worktrees for parallel task development
+git worktree add ../project-auth feature/auth-system
+git worktree add ../project-api feature/api-refactor
+
+# Run Claude Code in each worktree
+cd ../project-auth && claude    # Terminal 1: Auth work
+cd ../project-api && claude     # Terminal 2: API work
+```
+
+## Troubleshooting
+
+### AI Commands Failing
+
+```bash
+# Check API keys are configured
+cat .env                           # For CLI usage
+
+# Verify model configuration
+task-master models
+
+# Test with different model
+task-master models --set-fallback gpt-4o-mini
+```
+
+### MCP Connection Issues
+
+- Check `.mcp.json` configuration
+- Verify Node.js installation
+- Use `--mcp-debug` flag when starting Claude Code
+- Use CLI as fallback if MCP unavailable
+
+### Task File Sync Issues
+
+```bash
+# Regenerate task files from tasks.json
+task-master generate
+
+# Fix dependency issues
+task-master fix-dependencies
+```
+
+DO NOT RE-INITIALIZE. That will not do anything beyond re-adding the same Taskmaster core files.
+
+## Important Notes
+
+### AI-Powered Operations
+
+These commands make AI calls and may take up to a minute:
+
+- `parse_prd` / `task-master parse-prd`
+- `analyze_project_complexity` / `task-master analyze-complexity`
+- `expand_task` / `task-master expand`
+- `expand_all` / `task-master expand --all`
+- `add_task` / `task-master add-task`
+- `update` / `task-master update`
+- `update_task` / `task-master update-task`
+- `update_subtask` / `task-master update-subtask`
+
+### File Management
+
+- Never manually edit `tasks.json` - use commands instead
+- Never manually edit `.taskmaster/config.json` - use `task-master models`
+- Task markdown files in `tasks/` are auto-generated
+- Run `task-master generate` after manual changes to tasks.json
+
+### Claude Code Session Management
+
+- Use `/clear` frequently to maintain focused context
+- Create custom slash commands for repeated Task Master workflows
+- Configure tool allowlist to streamline permissions
+- Use headless mode for automation: `claude -p "task-master next"`
+
+### Multi-Task Updates
+
+- Use `update --from=<id>` to update multiple future tasks
+- Use `update-task --id=<id>` for single task updates
+- Use `update-subtask --id=<id>` for implementation logging
+
+### Research Mode
+
+- Add `--research` flag for research-based AI enhancement
+- Requires a research model API key like Perplexity (`PERPLEXITY_API_KEY`) in environment
+- Provides more informed task creation and updates
+- Recommended for complex technical tasks
+
+---
+
+_This guide ensures Claude Code has immediate access to Task Master's essential functionality for agentic development workflows._
--- a/.taskmaster/config.json
+++ b/.taskmaster/config.json
@@ -0,0 +1,46 @@
+{
+  "models": {
+    "main": {
+      "provider": "codex-cli",
+      "modelId": "gpt-5.2-codex",
+      "maxTokens": 128000,
+      "temperature": 0.2
+    },
+    "research": {
+      "provider": "claude-code",
+      "modelId": "opus",
+      "maxTokens": 32000,
+      "temperature": 0.1
+    },
+    "fallback": {
+      "provider": "claude-code",
+      "modelId": "sonnet",
+      "maxTokens": 64000,
+      "temperature": 0.2
+    }
+  },
+  "global": {
+    "logLevel": "info",
+    "debug": false,
+    "defaultNumTasks": 10,
+    "defaultSubtasks": 5,
+    "defaultPriority": "medium",
+    "projectName": "Taskmaster",
+    "ollamaBaseURL": "http://localhost:11434/api",
+    "bedrockBaseURL": "https://bedrock.us-east-1.amazonaws.com",
+    "responseLanguage": "English",
+    "enableCodebaseAnalysis": true,
+    "enableProxy": false,
+    "anonymousTelemetry": true,
+    "defaultTag": "master",
+    "azureOpenaiBaseURL": "https://your-endpoint.openai.azure.com/",
+    "userId": "1234567890"
+  },
+  "claudeCode": {},
+  "codexCli": {},
+  "grokCli": {
+    "timeout": 120000,
+    "workingDirectory": null,
+    "defaultModel": "grok-4-latest"
+  }
+}
--- a/.taskmaster/reports/task-complexity-report.json
+++ b/.taskmaster/reports/task-complexity-report.json
@@ -0,0 +1,93 @@
+{
+	"meta": {
+		"generatedAt": "2026-02-26T02:10:35.637Z",
+		"tasksAnalyzed": 10,
+		"totalTasks": 10,
+		"analysisCount": 10,
+		"thresholdScore": 5,
+		"projectName": "Taskmaster",
+		"usedResearch": false
+	},
+	"complexityAnalysis": [
+		{
+			"taskId": 1,
+			"taskTitle": "Project scaffolding and dependency setup",
+			"complexityScore": 3,
+			"recommendedSubtasks": 3,
+			"expansionPrompt": "Break down scaffolding into directory/package creation, requirements/.env.example updates, and verification steps (imports + pip install).",
+			"reasoning": "Repo currently contains only docs/config (`prd.md`, `.taskmaster/*`, `.env.example`) and no `app_factory` package, so this is greenfield setup with low technical risk. The main effort is creating directories, `__init__.py` files, and dependency lists; testing is straightforward import/install checks."
+		},
+		{
+			"taskId": 2,
+			"taskTitle": "Implement LangSmith observability and logging infrastructure",
+			"complexityScore": 6,
+			"recommendedSubtasks": 4,
+			"expansionPrompt": "Split into LangSmith client wrapper, tracing decorator/context manager, structured logging configuration, and unit tests with mocks.",
+			"reasoning": "No existing observability code, so this is a new module with moderate complexity due to LangSmith integration, async lifecycle handling, and structured logging. Testing requires mocking LangSmith and validating log formatting and error capture, adding moderate effort."
+		},
+		{
+			"taskId": 3,
+			"taskTitle": "Implement WorkspaceManager for Git worktree and Docker isolation",
+			"complexityScore": 8,
+			"recommendedSubtasks": 5,
+			"expansionPrompt": "Divide into Git worktree lifecycle, Docker container lifecycle, error handling/cleanup, async API surface, and integration tests with mocks.",
+			"reasoning": "Greenfield implementation but technically complex due to GitPython worktree management, Docker SDK integration, isolation configuration, and robust cleanup/error handling. Testing needs integration-style checks and mocks for Docker/Git, which increases effort."
+		},
+		{
+			"taskId": 4,
+			"taskTitle": "Implement TaskMasterAgent for claude-task-master integration",
+			"complexityScore": 7,
+			"recommendedSubtasks": 4,
+			"expansionPrompt": "Break into MCP client wrapper/retry logic, PRD parsing file writes, task query/status APIs, and tests using mocked MCP tools.",
+			"reasoning": "No agent code exists yet; integration with MCP tools and file writes to `.taskmaster` adds external dependencies and retry/backoff logic. Unit tests will rely on mocks for MCP responses and file system behavior."
+		},
+		{
+			"taskId": 5,
+			"taskTitle": "Implement PMAgent for PRD generation and clarification handling",
+			"complexityScore": 7,
+			"recommendedSubtasks": 4,
+			"expansionPrompt": "Split into prompt template loading, Anthropic API integration with token tracking, clarification handling/escalation, and PRD update/versioning tests.",
+			"reasoning": "Requires new prompt templates plus Anthropic API usage and token accounting. Clarification flow with optional human input adds branching logic and async handling; tests require mocked API calls and controlled input."
+		},
+		{
+			"taskId": 6,
+			"taskTitle": "Implement DevAgentManager for Claude Code/Codex subprocess automation",
+			"complexityScore": 9,
+			"recommendedSubtasks": 6,
+			"expansionPrompt": "Break into prompt preparation, Docker exec/pexpect orchestration, output parsing, retry/timeout handling, Codex fallback, and tests with mocks.",
+			"reasoning": "High complexity due to orchestration across Docker containers, interactive `pexpect` control, timeouts, retries, and fallback to another API. Parsing outputs and tracking changed files adds complexity, and testing needs heavy mocking of subprocesses and Docker."
+		},
+		{
+			"taskId": 7,
+			"taskTitle": "Implement QAAgent for code review, testing, and merge operations",
+			"complexityScore": 8,
+			"recommendedSubtasks": 5,
+			"expansionPrompt": "Split into git rebase/merge workflow, lint/test execution and parsing, Claude review integration, conflict resolution/retry logic, and tests with mocked git/CLI.",
+			"reasoning": "Complex workflow spanning git conflict handling, lint/test execution, Claude review, and merge operations. Error handling and retry logic are non-trivial, and tests require robust mocking of git and command outputs."
+		},
+		{
+			"taskId": 8,
+			"taskTitle": "Implement LangGraph state machine and orchestration graph",
+			"complexityScore": 9,
+			"recommendedSubtasks": 6,
+			"expansionPrompt": "Break into state schema/persistence, node implementations, conditional routing, concurrency handling with asyncio, checkpointing, and graph tests with mocked agents.",
+			"reasoning": "This is core orchestration logic with multiple agents, conditional routing, concurrency, and persistence. No existing scaffolding exists, so design is greenfield with substantial integration testing needs across mocked agents and state transitions."
+		},
+		{
+			"taskId": 9,
+			"taskTitle": "Implement GlobalArchitecture summary system and prompt templates",
+			"complexityScore": 6,
+			"recommendedSubtasks": 4,
+			"expansionPrompt": "Split into prompt template creation, architecture tracker storage schema, Claude-based summarization updates, and summary generation tests.",
+			"reasoning": "Prompt templates are straightforward but the ArchitectureTracker adds async Claude summarization and persistent JSON state. Moderate complexity due to parsing/summarization and ensuring token/length constraints in summaries, with mocked API tests."
+		},
+		{
+			"taskId": 10,
+			"taskTitle": "Implement main.py entry point, error handling, and end-to-end integration",
+			"complexityScore": 8,
+			"recommendedSubtasks": 5,
+			"expansionPrompt": "Break into CLI parsing/env loading, component wiring, retry/escalation logic, graceful shutdown/signal handling, and dry-run/validation tests.",
+			"reasoning": "Main integration point coordinates all components with error handling, retries, and shutdown behavior. This is high effort due to dependency wiring and control flow; testing requires broad mocking and simulation of exceptions and signals."
+		}
+	]
+}
--- a/.taskmaster/state.json
+++ b/.taskmaster/state.json
@@ -0,0 +1,6 @@
+{
+  "currentTag": "master",
+  "lastSwitched": "2026-02-26T02:01:01.984Z",
+  "branchTagMapping": {},
+  "migrationNoticeShown": true
+}
--- a/.taskmaster/tasks/tasks.json
+++ b/.taskmaster/tasks/tasks.json
@@ -0,0 +1,761 @@
+{
+  "master": {
+    "tasks": [
+      {
+        "id": "1",
+        "title": "Project scaffolding and dependency setup",
+        "description": "Initialize Python project structure with all required directories, configuration files, and install core dependencies (LangGraph, LangSmith, GitPython, docker, pexpect)",
+        "details": "Create the app_factory/ directory structure as specified in PRD:\n- app_factory/agents/ (pm_agent.py, task_agent.py, dev_agent.py, qa_agent.py)\n- app_factory/core/ (graph.py, workspace.py, observability.py)\n- app_factory/prompts/\n- app_factory/data/\n- main.py at root\n- requirements.txt with:\n  * langgraph>=0.0.20\n  * langsmith>=0.1.0\n  * gitpython>=3.1.40\n  * docker>=7.0.0\n  * pexpect>=4.9.0\n  * anthropic>=0.18.0\n  * openai>=1.10.0\n  * python-dotenv>=1.0.0\n  * pydantic>=2.5.0\n  * asyncio (built-in)\n\nCreate .env.example for API keys (ANTHROPIC_API_KEY, OPENAI_API_KEY, LANGSMITH_API_KEY). Initialize git if not already done. Create __init__.py files for proper package structure.",
+        "testStrategy": "Verify directory structure matches PRD specification. Run 'pip install -r requirements.txt' successfully. Import all packages without errors. Verify __init__.py files exist in all package directories.",
+        "priority": "high",
+        "dependencies": [],
+        "status": "done",
+        "subtasks": [
+          {
+            "id": 1,
+            "title": "Create project directories and package skeleton",
+            "description": "Set up the app_factory package folders and initial module files per the PRD.",
+            "dependencies": [],
+            "details": "Create the app_factory/agents, app_factory/core, app_factory/prompts, and app_factory/data directories. Add empty module files for agents (pm_agent.py, task_agent.py, dev_agent.py, qa_agent.py) and core (graph.py, workspace.py, observability.py), plus main.py at repo root. Add __init__.py files in app_factory, app_factory/agents, and app_factory/core to ensure package imports work.",
+            "status": "pending",
+            "testStrategy": null,
+            "parentId": "undefined"
+          },
+          {
+            "id": 2,
+            "title": "Add dependency and environment configuration files",
+            "description": "Define required dependencies and environment variable templates.",
+            "dependencies": [],
+            "details": "Create or update requirements.txt to include the specified versions for langgraph, langsmith, gitpython, docker, pexpect, anthropic, openai, python-dotenv, pydantic, and note asyncio as built-in. Add or update .env.example with ANTHROPIC_API_KEY, OPENAI_API_KEY, and LANGSMITH_API_KEY placeholders.",
+            "status": "pending",
+            "testStrategy": null,
+            "parentId": "undefined"
+          },
+          {
+            "id": 3,
+            "title": "Initialize git and verify scaffolding integrity",
+            "description": "Ensure repository is initialized and structure is verifiable.",
+            "dependencies": [
+              1,
+              2
+            ],
+            "details": "Initialize git if .git does not exist, and verify that directory structure matches the PRD and that __init__.py files are present. Perform a basic import check for package modules after installing requirements with pip to confirm dependencies resolve correctly.",
+            "status": "pending",
+            "testStrategy": "Run `pip install -r requirements.txt`, then use a small Python import check for app_factory modules and core dependencies.",
+            "parentId": "undefined"
+          }
+        ],
+        "complexity": 3,
+        "recommendedSubtasks": 3,
+        "expansionPrompt": "Break down scaffolding into directory/package creation, requirements/.env.example updates, and verification steps (imports + pip install).",
+        "updatedAt": "2026-02-26T02:33:35.564Z"
+      },
+      {
+        "id": "2",
+        "title": "Implement LangSmith observability and logging infrastructure",
+        "description": "Build the observability.py module with LangSmith tracing integration and structured Python logging for tracking agent decisions, token usage, and execution flow",
+        "details": "In app_factory/core/observability.py:\n- Create ObservabilityManager class that wraps LangSmith client\n- Implement trace_agent_execution() decorator for tracking agent calls with context (agent_name, task_id, input, output, token_count)\n- Implement structured logging with log levels (DEBUG, INFO, WARNING, ERROR)\n- Create methods: start_trace(), end_trace(), log_state_transition(), log_token_usage(), log_error()\n- Configure LangSmith project name from env var (LANGSMITH_PROJECT)\n- Add async context manager support for automatic trace lifecycle\n- Integrate Python's logging module with custom formatters including timestamps, agent names, and task IDs\n\nKey methods:\nclass ObservabilityManager:\n    def __init__(self, project_name: str)\n    async def trace_agent(self, agent_name: str, task_id: str, func: Callable)\n    def log_state(self, state: dict)\n    def get_metrics(self) -> dict",
+        "testStrategy": "Unit tests for ObservabilityManager initialization. Mock LangSmith client and verify trace creation/completion. Test logging output format. Verify async context manager properly starts/ends traces. Test error logging captures stack traces.",
+        "priority": "high",
+        "dependencies": [
+          "1"
+        ],
+        "status": "done",
+        "subtasks": [
+          {
+            "id": 1,
+            "title": "Create ObservabilityManager and LangSmith client wrapper",
+            "description": "Define the core class and LangSmith client integration layer.",
+            "dependencies": [],
+            "details": "Implement `ObservabilityManager` initialization with project name lookup from `LANGSMITH_PROJECT`, store LangSmith client, and expose base methods like `start_trace`, `end_trace`, `log_state_transition`, `log_token_usage`, and `log_error` that call into the client as needed.",
+            "status": "pending",
+            "testStrategy": null,
+            "parentId": "undefined"
+          },
+          {
+            "id": 2,
+            "title": "Add tracing decorator and async context manager support",
+            "description": "Provide trace lifecycle utilities for agent execution.",
+            "dependencies": [
+              1
+            ],
+            "details": "Implement `trace_agent_execution` decorator and `trace_agent` async helper to capture agent name, task id, inputs/outputs, token counts, and errors; add async context manager support to automatically start/end traces around agent runs.",
+            "status": "pending",
+            "testStrategy": null,
+            "parentId": "undefined"
+          },
+          {
+            "id": 3,
+            "title": "Configure structured Python logging for observability",
+            "description": "Set up standardized logging outputs for agent tracing.",
+            "dependencies": [
+              1
+            ],
+            "details": "Integrate `logging` module with custom formatter including timestamps, agent name, and task id; ensure log levels DEBUG/INFO/WARNING/ERROR map to structured fields and are used by `ObservabilityManager` methods.",
+            "status": "pending",
+            "testStrategy": null,
+            "parentId": "undefined"
+          },
+          {
+            "id": 4,
+            "title": "Write unit tests with LangSmith mocks and log assertions",
+            "description": "Validate observability behavior through isolated tests.",
+            "dependencies": [
+              1,
+              2,
+              3
+            ],
+            "details": "Add tests for ObservabilityManager initialization, trace start/end, decorator behavior, async context lifecycle, error logging with stack traces, token usage logging, and log format validation using mocked LangSmith client and log capture utilities.",
+            "status": "pending",
+            "testStrategy": "Run unit tests with mocks to validate trace creation/completion, logging output, and error capture.",
+            "parentId": "undefined"
+          }
+        ],
+        "complexity": 6,
+        "recommendedSubtasks": 4,
+        "expansionPrompt": "Split into LangSmith client wrapper, tracing decorator/context manager, structured logging configuration, and unit tests with mocks.",
+        "updatedAt": "2026-02-26T02:38:01.502Z"
+      },
+      {
+        "id": "3",
+        "title": "Implement WorkspaceManager for Git worktree and Docker isolation",
+        "description": "Build workspace.py to manage git worktree creation and Docker container provisioning for isolated Dev Agent execution environments",
+        "details": "In app_factory/core/workspace.py:\n- Use GitPython for git operations and docker Python SDK for container management\n- Implement create_worktree(task_id: str, base_branch: str = 'main') -> str:\n  * Validates base_branch exists\n  * Creates worktree at ../worktrees/{task_id} using git.worktree_add()\n  * Creates branch name: feature/task-{task_id}\n  * Returns absolute worktree path\n- Implement spin_up_clean_room(worktree_path: str, task_id: str) -> docker.Container:\n  * Pulls base image (python:3.11-slim or custom image with Claude Code installed)\n  * Mounts worktree_path as /workspace in container (read/write)\n  * Sets working directory to /workspace\n  * Configures network isolation (no internet by default, allowlist only necessary domains)\n  * Returns container object with metadata (container_id, task_id)\n- Implement cleanup_workspace(task_id: str, container: docker.Container):\n  * Stops and removes Docker container\n  * Removes git worktree using git.worktree_remove()\n- Add error handling for git conflicts, Docker daemon unavailable, disk space issues\n- Implement get_active_workspaces() -> List[dict] to track all active containers\n\nClass signature:\nclass WorkspaceManager:\n    def __init__(self, repo_path: str, docker_image: str)\n    async def create_worktree(self, task_id: str) -> str\n    async def spin_up_clean_room(self, worktree_path: str, task_id: str) -> Container\n    async def cleanup_workspace(self, task_id: str, container: Container)",
+        "testStrategy": "Integration tests with actual git repo (create test repo). Verify worktree creation in correct location. Mock Docker SDK and verify container creation with correct mount points. Test cleanup removes worktree and container. Test error handling when git or Docker fails. Verify concurrent worktree creation for multiple tasks.",
+        "priority": "high",
+        "dependencies": [
+          "1",
+          "2"
+        ],
+        "status": "done",
+        "subtasks": [
+          {
+            "id": 1,
+            "title": "Implement Git worktree lifecycle management",
+            "description": "Build create_worktree to validate branches and create feature worktrees.",
+            "dependencies": [],
+            "details": "Use GitPython to open repo_path, verify base_branch exists, create branch feature/task-{task_id}, add worktree at ../worktrees/{task_id}, and return absolute path.",
+            "status": "pending",
+            "testStrategy": "Integration test with temp git repo verifying worktree path and branch creation.",
+            "parentId": "undefined"
+          },
+          {
+            "id": 2,
+            "title": "Implement Docker clean-room provisioning",
+            "description": "Add spin_up_clean_room to create isolated containers for worktrees.",
+            "dependencies": [
+              1
+            ],
+            "details": "Use docker SDK to pull the configured image, mount worktree_path at /workspace, set working dir, configure network isolation/allowlist, and return container metadata including task_id.",
+            "status": "pending",
+            "testStrategy": "Mock docker SDK to assert image pull, mount bindings, and container settings.",
+            "parentId": "undefined"
+          },
+          {
+            "id": 3,
+            "title": "Add workspace cleanup and error handling",
+            "description": "Ensure containers and worktrees are removed safely on cleanup.",
+            "dependencies": [
+              2
+            ],
+            "details": "Implement cleanup_workspace to stop/remove container, remove git worktree, and handle errors like conflicts, daemon unavailable, and disk space issues with clear exceptions.",
+            "status": "pending",
+            "testStrategy": "Unit tests with mocks to confirm cleanup calls and error propagation.",
+            "parentId": "undefined"
+          },
+          {
+            "id": 4,
+            "title": "Expose async API surface and workspace tracking",
+            "description": "Make class methods async and track active workspaces.",
+            "dependencies": [
+              1,
+              2,
+              3
+            ],
+            "details": "Define WorkspaceManager __init__ and async methods, add get_active_workspaces returning container/task metadata, and maintain internal registry.",
+            "status": "pending",
+            "testStrategy": "Async unit tests verifying registry updates on create/spinup/cleanup.",
+            "parentId": "undefined"
+          },
+          {
+            "id": 5,
+            "title": "Create integration tests with Git/Docker mocks",
+            "description": "Add integration-style tests that cover full workflow.",
+            "dependencies": [
+              1,
+              2,
+              3,
+              4
+            ],
+            "details": "Set up test repo, validate worktree creation location, mock Docker for container creation, and verify cleanup removes resources and handles failures.",
+            "status": "pending",
+            "testStrategy": "Pytest suite using temp dirs, GitPython real repo, and Docker SDK mocks.",
+            "parentId": "undefined"
+          }
+        ],
+        "complexity": 8,
+        "recommendedSubtasks": 5,
+        "expansionPrompt": "Divide into Git worktree lifecycle, Docker container lifecycle, error handling/cleanup, async API surface, and integration tests with mocks.",
+        "updatedAt": "2026-02-26T02:40:47.441Z"
+      },
+      {
+        "id": "4",
+        "title": "Implement TaskMasterAgent for claude-task-master integration",
+        "description": "Build task_agent.py to interface with claude-task-master via MCP client for task graph management, dependency resolution, and status tracking",
+        "details": "In app_factory/agents/task_agent.py:\n- Create TaskMasterAgent class that interfaces with claude-task-master as MCP client\n- Implement parse_prd(prd_content: str, num_tasks: int = 10) -> dict:\n  * Writes PRD to .taskmaster/docs/prd.md\n  * Calls task-master parse-prd via MCP tool mcp__task_master_ai__parse_prd\n  * Returns parsed task structure with IDs, dependencies, priorities\n- Implement get_unblocked_tasks() -> List[dict]:\n  * Calls mcp__task_master_ai__get_tasks with status filter\n  * Returns tasks where all dependencies are 'done' and status is 'pending'\n- Implement update_task_status(task_id: str, status: str, notes: str = ''):\n  * Calls mcp__task_master_ai__set_task_status\n  * Optionally calls mcp__task_master_ai__update_subtask for notes\n- Implement get_task_details(task_id: str) -> dict:\n  * Calls mcp__task_master_ai__get_task\n  * Returns full task object with title, description, details, testStrategy\n- Implement get_next_task() -> dict:\n  * Calls mcp__task_master_ai__next_task\n  * Returns highest priority unblocked task\n- Add retry logic with exponential backoff for MCP calls\n- Implement expand_task(task_id: str, num_subtasks: int = 5) for breaking down complex tasks\n\nClass signature:\nclass TaskMasterAgent:\n    def __init__(self, project_root: str, mcp_client: MCPClient)\n    async def parse_prd(self, prd_content: str) -> dict\n    async def get_unblocked_tasks(self) -> List[dict]\n    async def update_task_status(self, task_id: str, status: str)\n    async def get_task_details(self, task_id: str) -> dict",
+        "testStrategy": "Mock MCP client and verify correct tool calls with expected parameters. Test parse_prd creates .taskmaster/docs/prd.md correctly. Unit test get_unblocked_tasks filters dependencies correctly. Test update_task_status handles both main tasks and subtasks. Verify retry logic triggers on MCP failures. Integration test with real claude-task-master installation.",
+        "priority": "high",
+        "dependencies": [
+          "1",
+          "2"
+        ],
+        "status": "done",
+        "subtasks": [
+          {
+            "id": 1,
+            "title": "Design TaskMasterAgent MCP wrapper with retries",
+            "description": "Define the TaskMasterAgent class structure and MCP call wrapper with exponential backoff.",
+            "dependencies": [],
+            "details": "Implement an internal async MCP call helper that handles retries, backoff timing, and error surfacing, and wire it into the agent constructor with project root and client references.",
+            "status": "pending",
+            "testStrategy": "Mock MCP client failures to verify retries and backoff timing.",
+            "parentId": "undefined"
+          },
+          {
+            "id": 2,
+            "title": "Implement PRD parsing and file output flow",
+            "description": "Add parse_prd to write PRD content and call the MCP parse tool.",
+            "dependencies": [
+              1
+            ],
+            "details": "Create .taskmaster/docs/prd.md from provided content, invoke mcp__task_master_ai__parse_prd with num_tasks, and return parsed task structure with IDs, dependencies, and priorities.",
+            "status": "pending",
+            "testStrategy": "Mock filesystem writes and MCP responses to validate file path and payloads.",
+            "parentId": "undefined"
+          },
+          {
+            "id": 3,
+            "title": "Implement task query and status APIs",
+            "description": "Add unblocked task retrieval, details lookup, and status updates.",
+            "dependencies": [
+              1
+            ],
+            "details": "Implement get_unblocked_tasks using mcp__task_master_ai__get_tasks and dependency checks, get_task_details via mcp__task_master_ai__get_task, and update_task_status with optional subtask notes.",
+            "status": "pending",
+            "testStrategy": "Mock MCP tools to verify filters, dependency logic, and status/notes calls.",
+            "parentId": "undefined"
+          },
+          {
+            "id": 4,
+            "title": "Add expand_task and unit tests",
+            "description": "Implement task expansion API and write unit tests for core behaviors.",
+            "dependencies": [
+              2,
+              3
+            ],
+            "details": "Add expand_task using mcp__task_master_ai__expand_task (or equivalent) and create tests covering parse_prd, unblocked filtering, status updates, and retry logic with mocked MCP tools.",
+            "status": "pending",
+            "testStrategy": "Use pytest with mocked MCP client to validate behaviors and error handling.",
+            "parentId": "undefined"
+          }
+        ],
+        "complexity": 7,
+        "recommendedSubtasks": 4,
+        "expansionPrompt": "Break into MCP client wrapper/retry logic, PRD parsing file writes, task query/status APIs, and tests using mocked MCP tools.",
+        "updatedAt": "2026-02-26T02:40:48.102Z"
+      },
+      {
+        "id": "5",
+        "title": "Implement PMAgent for PRD generation and clarification handling",
+        "description": "Build pm_agent.py to expand user prompts into structured PRDs and handle clarification requests from downstream agents using Claude 3.7 Sonnet",
+        "details": "In app_factory/agents/pm_agent.py:\n- Create PMAgent class using Anthropic Claude 3.7 Sonnet (claude-3-5-sonnet-20250219 or latest)\n- Implement expand_prompt_to_prd(user_input: str) -> str:\n  * Load prompt template from app_factory/prompts/pm_prd_expansion.txt\n  * Template should instruct Claude to: analyze input, identify missing requirements, specify tech stack, define success criteria, outline architecture\n  * Call Anthropic API with system prompt as PM expert\n  * Return structured PRD in markdown format with sections: Objective, Requirements, Architecture, Tech Stack, Success Criteria\n- Implement handle_clarification_request(clarification: dict) -> str:\n  * Takes ClarificationRequest object with fields: requesting_agent (dev/qa/task), task_id, question, context\n  * Loads prompt template from app_factory/prompts/pm_clarification.txt\n  * Either auto-resolves if question is answerable from existing PRD\n  * Or escalates to human via input() prompt (blocking operation)\n  * Returns clarification response as string\n- Implement update_prd(prd_path: str, updates: str):\n  * Appends clarifications/updates to existing PRD file\n  * Maintains version history in PRD document\n- Add token usage tracking for cost monitoring\n- Implement async methods for non-blocking execution\n\nClass signature:\nclass PMAgent:\n    def __init__(self, api_key: str, model: str = 'claude-3-5-sonnet-20250219')\n    async def expand_prompt_to_prd(self, user_input: str) -> str\n    async def handle_clarification_request(self, clarification: dict) -> str\n    def update_prd(self, prd_path: str, updates: str)",
+        "testStrategy": "Mock Anthropic API and verify system prompt structure. Test expand_prompt_to_prd returns valid markdown PRD. Test handle_clarification_request with auto-resolvable questions (mock no human input). Test escalation to human input. Verify token tracking accuracy. Test PRD update appends correctly without corrupting file.",
+        "priority": "high",
+        "dependencies": [
+          "1",
+          "2"
+        ],
+        "status": "done",
+        "subtasks": [
+          {
+            "id": 1,
+            "title": "Scaffold PMAgent class and prompt loading",
+            "description": "Create the PMAgent class structure and load prompt templates from disk.",
+            "dependencies": [],
+            "details": "Add pm_agent.py with PMAgent __init__ signature, helper to read prompt files from app_factory/prompts, and placeholders for async methods to ensure template loading is centralized and reusable.",
+            "status": "pending",
+            "testStrategy": null,
+            "parentId": "undefined"
+          },
+          {
+            "id": 2,
+            "title": "Implement PRD expansion with Claude and token tracking",
+            "description": "Implement expand_prompt_to_prd using Anthropic API and capture token usage.",
+            "dependencies": [
+              1
+            ],
+            "details": "Call Claude 3.7 Sonnet with a PM system prompt and the pm_prd_expansion template, return markdown PRD sections, and record token usage/cost metadata in the response or internal counters.",
+            "status": "pending",
+            "testStrategy": "Mock Anthropic client; verify system/user prompts, markdown sections, and token counters updated.",
+            "parentId": "undefined"
+          },
+          {
+            "id": 3,
+            "title": "Handle clarification requests with auto-resolve and escalation",
+            "description": "Implement handle_clarification_request logic for auto-resolution or human input.",
+            "dependencies": [
+              1,
+              2
+            ],
+            "details": "Load pm_clarification template, answer from existing PRD context when possible, otherwise block on input() for human response, and return the clarification string while tracking token usage for model calls.",
+            "status": "pending",
+            "testStrategy": "Mock clarification inputs and Anthropic calls; cover auto-resolve and human escalation branches.",
+            "parentId": "undefined"
+          },
+          {
+            "id": 4,
+            "title": "Add PRD update/versioning and async integration tests",
+            "description": "Implement PRD update/version history and add tests for PMAgent flows.",
+            "dependencies": [
+              2,
+              3
+            ],
+            "details": "Append updates to PRD with version headers/timestamps, ensure async methods are non-blocking, and create tests for PRD appends, clarification updates, and token tracking accuracy.",
+            "status": "pending",
+            "testStrategy": "Use temp files to verify append/version history and async test harness for method behavior.",
+            "parentId": "undefined"
+          }
+        ],
+        "complexity": 7,
+        "recommendedSubtasks": 4,
+        "expansionPrompt": "Split into prompt template loading, Anthropic API integration with token tracking, clarification handling/escalation, and PRD update/versioning tests.",
+        "updatedAt": "2026-02-26T02:40:48.806Z"
+      },
+      {
+        "id": "6",
+        "title": "Implement DevAgentManager for Claude Code/Codex subprocess automation",
+        "description": "Build dev_agent.py to spawn Dev Agents in Docker containers, interface with Claude Code via pexpect, and execute task implementations with strict context isolation",
+        "details": "In app_factory/agents/dev_agent.py:\n- Create DevAgentManager class that spawns Dev Agents in Docker containers\n- Implement execute_task(task: dict, worktree_path: str, container: Container, global_arch: str) -> dict:\n  * Constructs minimized prompt from task details + global_arch summary\n  * Loads prompt template from app_factory/prompts/dev_task_execution.txt\n  * Template structure:\n    - Task ID, title, description, details, testStrategy\n    - Global architecture summary (read-only context)\n    - Strict instruction: implement only this task, no extraneous changes\n    - Must create test file and pass tests before completion\n  * Uses pexpect to spawn claude command inside Docker container:\n    - pexpect.spawn('docker exec -it {container_id} claude --headless --prompt-file /tmp/task_prompt.txt')\n  * Monitors Claude Code stdout/stderr with timeout (30 min default)\n  * Parses exit code: 0 = success, non-zero = failure\n  * If exit code != 0 or timeout, trigger ClarificationRequest\n  * Returns result dict: {status: 'success'|'failed'|'needs_clarification', output: str, files_changed: List[str]}\n- Implement prepare_task_prompt(task: dict, global_arch: str) -> str:\n  * Generates minimal context prompt file\n- Implement parse_claude_output(output: str) -> dict:\n  * Extracts files changed, test results, error messages\n- Add fallback to OpenAI Codex API for algorithmic generation if Claude Code fails\n- Implement max retry counter (max_retries=3 per PRD requirement)\n- Handle timeout errors and container crashes gracefully\n\nClass signature:\nclass DevAgentManager:\n    def __init__(self, docker_client: docker.DockerClient, max_retries: int = 3)\n    async def execute_task(self, task: dict, container: Container, global_arch: str) -> dict\n    def prepare_task_prompt(self, task: dict, global_arch: str) -> str",
+        "testStrategy": "Mock pexpect.spawn and Docker exec. Test prompt construction includes all required fields. Test timeout triggers after configured duration. Verify max_retries prevents infinite loops. Test exit code parsing (0 vs non-zero). Mock Claude Code failure and verify fallback to Codex. Integration test with real Docker container and mock Claude Code script.",
+        "priority": "high",
+        "dependencies": [
+          "1",
+          "2",
+          "3"
+        ],
+        "status": "done",
+        "subtasks": [
+          {
+            "id": 1,
+            "title": "Draft DevAgentManager class skeleton and prompt utilities",
+            "description": "Create the base class and method signatures in dev_agent.py.",
+            "dependencies": [],
+            "details": "Add DevAgentManager __init__ with docker client and max_retries. Stub execute_task, prepare_task_prompt, and parse_claude_output with docstrings and type hints.",
+            "status": "pending",
+            "testStrategy": null,
+            "parentId": "undefined"
+          },
+          {
+            "id": 2,
+            "title": "Implement prompt template loading and prompt construction",
+            "description": "Build prompt generation from task fields and global architecture summary.",
+            "dependencies": [
+              1
+            ],
+            "details": "Load app_factory/prompts/dev_task_execution.txt, inject task id/title/description/details/testStrategy and global_arch section, and enforce strict instructions in the generated prompt string.",
+            "status": "pending",
+            "testStrategy": null,
+            "parentId": "undefined"
+          },
+          {
+            "id": 3,
+            "title": "Add Docker exec + pexpect orchestration for Claude Code",
+            "description": "Run claude in container using pexpect with timeout and capture output.",
+            "dependencies": [
+              1,
+              2
+            ],
+            "details": "Spawn docker exec with claude --headless --prompt-file, monitor stdout/stderr, enforce 30-minute timeout, collect exit code and output, and handle container crashes gracefully.",
+            "status": "pending",
+            "testStrategy": null,
+            "parentId": "undefined"
+          },
+          {
+            "id": 4,
+            "title": "Parse Claude output and format execute_task result",
+            "description": "Extract changed files, test results, and errors from Claude output.",
+            "dependencies": [
+              3
+            ],
+            "details": "Implement parse_claude_output to derive files_changed/test status/error messages and return execute_task result dict with status, output, and files_changed.",
+            "status": "pending",
+            "testStrategy": null,
+            "parentId": "undefined"
+          },
+          {
+            "id": 5,
+            "title": "Add retry logic, timeout handling, and Codex fallback",
+            "description": "Implement max retry loop and fallback to OpenAI Codex when Claude fails.",
+            "dependencies": [
+              3,
+              4
+            ],
+            "details": "Add max_retries loop in execute_task, map non-zero exit/timeout to needs_clarification or failed, and invoke Codex fallback for algorithmic generation when Claude fails.",
+            "status": "pending",
+            "testStrategy": null,
+            "parentId": "undefined"
+          },
+          {
+            "id": 6,
+            "title": "Create unit tests with mocks for orchestration and parsing",
+            "description": "Add tests for prompt generation, retries, timeout, and fallback behavior.",
+            "dependencies": [
+              2,
+              3,
+              4,
+              5
+            ],
+            "details": "Mock docker client, container exec, and pexpect.spawn; assert prompt contents, timeout triggers, exit code handling, retry cap, and Codex fallback invocation.",
+            "status": "pending",
+            "testStrategy": "Use pytest with mocks for Docker and pexpect; verify parsing and retry behavior.",
+            "parentId": "undefined"
+          }
+        ],
+        "complexity": 9,
+        "recommendedSubtasks": 6,
+        "expansionPrompt": "Break into prompt preparation, Docker exec/pexpect orchestration, output parsing, retry/timeout handling, Codex fallback, and tests with mocks.",
+        "updatedAt": "2026-02-26T02:45:27.367Z"
+      },
+      {
+        "id": "7",
+        "title": "Implement QAAgent for code review, testing, and merge operations",
+        "description": "Build qa_agent.py to perform static analysis, run tests, handle git rebase/merge conflicts, and merge completed worktrees back to main branch",
+        "details": "In app_factory/agents/qa_agent.py:\n- Create QAAgent class using Claude 3.7 Sonnet for code review reasoning\n- Implement review_and_merge(task_id: str, worktree_path: str) -> dict:\n  * Step 1: git rebase main on Dev Agent's feature branch\n    - If rebase has conflicts, parse git status output\n    - If conflicts are simple (same-file, non-overlapping), auto-resolve\n    - If conflicts are complex, return {status: 'rebase_failed', conflicts: [...], action: 'kick_back_to_dev'}\n  * Step 2: Run static analysis with ruff or pylint\n    - Parse linting output, fail if critical errors found\n  * Step 3: Run tests using pytest in worktree\n    - Parse test output, extract pass/fail counts\n    - If tests fail, extract failure details\n  * Step 4: Code review with Claude API\n    - Read all changed files (git diff main)\n    - Send to Claude with prompt from app_factory/prompts/qa_review.txt\n    - Prompt instructs: check for security issues (SQL injection, XSS, command injection), code quality, adherence to task requirements\n    - If Claude identifies issues, return {status: 'review_failed', issues: [...], action: 'kick_back_to_dev'}\n  * Step 5: Merge to main\n    - git checkout main && git merge --no-ff feature/task-{task_id}\n    - git push origin main\n  * Return {status: 'merged', commit_sha: str}\n- Implement parse_test_results(output: str) -> dict:\n  * Extracts pytest output (passed, failed, errors)\n- Implement auto_resolve_conflicts(conflicts: List[str]) -> bool:\n  * Attempts git rerere or simple conflict resolution\n- Add retry counter (max 3 attempts per PRD) before escalation\n\nClass signature:\nclass QAAgent:\n    def __init__(self, repo_path: str, api_key: str, max_retries: int = 3)\n    async def review_and_merge(self, task_id: str, worktree_path: str) -> dict\n    def run_tests(self, worktree_path: str) -> dict\n    async def code_review(self, diff: str, task: dict) -> dict",
+        "testStrategy": "Mock git operations and verify rebase called correctly. Test conflict detection parses git status. Test static analysis runs ruff/pylint. Mock pytest execution and verify output parsing. Mock Claude API for code review. Test merge operation creates proper commit. Verify max retry counter works. Test kick_back_to_dev returns correct status.",
+        "priority": "high",
+        "dependencies": [
+          "1",
+          "2",
+          "3"
+        ],
+        "status": "done",
+        "subtasks": [
+          {
+            "id": 1,
+            "title": "Implement QAAgent skeleton and core workflows",
+            "description": "Create QAAgent class and wire core review_and_merge flow structure.",
+            "dependencies": [],
+            "details": "Add qa_agent.py class definition, init, and review_and_merge orchestration with placeholders for rebase, lint, tests, review, and merge steps.",
+            "status": "pending",
+            "testStrategy": null,
+            "parentId": "undefined"
+          },
+          {
+            "id": 2,
+            "title": "Build git rebase/merge and conflict handling",
+            "description": "Implement git rebase, conflict parsing, auto-resolve, and merge logic.",
+            "dependencies": [
+              1
+            ],
+            "details": "Add git commands for rebase/merge, parse git status for conflicts, implement auto_resolve_conflicts and retry escalation behavior.",
+            "status": "pending",
+            "testStrategy": null,
+            "parentId": "undefined"
+          },
+          {
+            "id": 3,
+            "title": "Add linting and test execution with parsing",
+            "description": "Run static analysis and pytest with robust output parsing.",
+            "dependencies": [
+              1
+            ],
+            "details": "Implement run_tests and parse_test_results, wire linting via ruff/pylint, and return structured results for failures.",
+            "status": "pending",
+            "testStrategy": null,
+            "parentId": "undefined"
+          },
+          {
+            "id": 4,
+            "title": "Integrate Claude review for code quality checks",
+            "description": "Send diffs to Claude and evaluate review feedback.",
+            "dependencies": [
+              1
+            ],
+            "details": "Load prompt template, build diff payload, call Claude API, and return review_failed with issues when needed.",
+            "status": "pending",
+            "testStrategy": null,
+            "parentId": "undefined"
+          },
+          {
+            "id": 5,
+            "title": "Create tests with mocked git/CLI/Claude",
+            "description": "Add unit tests covering QAAgent behavior and retries.",
+            "dependencies": [
+              1,
+              2,
+              3,
+              4
+            ],
+            "details": "Mock git commands, lint/test outputs, and Claude responses; verify conflict handling, retries, and merge success.",
+            "status": "pending",
+            "testStrategy": null,
+            "parentId": "undefined"
+          }
+        ],
+        "complexity": 8,
+        "recommendedSubtasks": 5,
+        "expansionPrompt": "Split into git rebase/merge workflow, lint/test execution and parsing, Claude review integration, conflict resolution/retry logic, and tests with mocked git/CLI.",
+        "updatedAt": "2026-02-26T02:45:07.395Z"
+      },
+      {
+        "id": "8",
+        "title": "Implement LangGraph state machine and orchestration graph",
+        "description": "Build graph.py with LangGraph StateGraph defining nodes for PM, Task, Dev, and QA agents, including bi-directional clarification flow and concurrent execution logic",
+        "details": "In app_factory/core/graph.py:\n- Create AppFactoryOrchestrator class wrapping LangGraph StateGraph\n- Define state schema using TypedDict:\n  * user_input: str\n  * prd: str\n  * tasks: List[dict]\n  * active_tasks: dict (task_id -> DevAgent state)\n  * completed_tasks: List[str]\n  * blocked_tasks: dict (task_id -> reason)\n  * clarification_requests: List[dict]\n  * global_architecture: str\n- Define nodes:\n  * pm_node: Calls PMAgent.expand_prompt_to_prd(), updates state.prd and state.global_architecture\n  * task_node: Calls TaskMasterAgent.parse_prd() and get_unblocked_tasks(), updates state.tasks\n  * dev_dispatch_node: Concurrently spawns DevAgentManager.execute_task() for all unblocked tasks using asyncio.gather()\n  * qa_node: Calls QAAgent.review_and_merge() for completed dev tasks\n  * clarification_node: Handles ClarificationRequest, routes to PM, updates state\n- Define edges with conditional routing:\n  * pm_node -> task_node (always)\n  * task_node -> dev_dispatch_node (if unblocked_tasks exist) OR END (if all done)\n  * dev_dispatch_node -> qa_node (for completed tasks) OR clarification_node (if clarification needed)\n  * qa_node -> task_node (loop back to check newly unblocked tasks) OR dev_dispatch_node (kick back to dev on failure)\n  * clarification_node -> dev_dispatch_node (after clarification resolved)\n- Implement should_continue(state: dict) -> str routing function for conditional edges\n- Add concurrent execution support using asyncio for dev_dispatch_node\n- Implement state persistence to disk (app_factory/data/state.json) after each node execution\n- Compile graph with checkpointing enabled for recovery\n\nClass signature:\nclass AppFactoryOrchestrator:\n    def __init__(self, pm_agent: PMAgent, task_agent: TaskMasterAgent, dev_manager: DevAgentManager, qa_agent: QAAgent)\n    def build_graph(self) -> StateGraph\n    async def run(self, user_input: str) -> dict",
+        "testStrategy": "Mock all agent classes. Test state transitions through each node. Verify pm_node -> task_node edge. Test conditional routing in should_continue(). Test concurrent dev_dispatch_node spawns multiple tasks. Verify clarification_node routes back correctly. Test state persistence saves/loads correctly. Integration test with full graph execution (mocked agents).",
+        "priority": "high",
+        "dependencies": [
+          "4",
+          "5",
+          "6",
+          "7"
+        ],
+        "status": "done",
+        "subtasks": [
+          {
+            "id": 1,
+            "title": "Define orchestration state schema and persistence",
+            "description": "Create TypedDict state schema and load/save hooks.",
+            "dependencies": [],
+            "details": "Implement TypedDict fields for user_input, prd, tasks, active_tasks, completed_tasks, blocked_tasks, clarification_requests, global_architecture, and add JSON persistence to `app_factory/data/state.json` after each node execution.",
+            "status": "pending",
+            "testStrategy": "Unit test serialize/deserialize with sample state and ensure writes after node execution.",
+            "parentId": "undefined"
+          },
+          {
+            "id": 2,
+            "title": "Implement AppFactoryOrchestrator skeleton and node functions",
+            "description": "Add class wrapper and core node implementations.",
+            "dependencies": [
+              1
+            ],
+            "details": "Create `AppFactoryOrchestrator` with `build_graph` and `run`, and implement `pm_node`, `task_node`, `dev_dispatch_node`, `qa_node`, `clarification_node` calling the respective agent methods and updating state fields.",
+            "status": "pending",
+            "testStrategy": "Mock agents and assert each node updates state correctly.",
+            "parentId": "undefined"
+          },
+          {
+            "id": 3,
+            "title": "Add conditional routing and should_continue logic",
+            "description": "Define edge conditions for graph routing.",
+            "dependencies": [
+              2
+            ],
+            "details": "Implement `should_continue(state)` and conditional edges: pm->task, task->dev or END, dev->qa or clarification, qa->task or dev, clarification->dev; encode decisions based on blocked/clarification/completion flags.",
+            "status": "pending",
+            "testStrategy": "Table-driven tests for routing decisions across state scenarios.",
+            "parentId": "undefined"
+          },
+          {
+            "id": 4,
+            "title": "Add asyncio concurrency for dev dispatch",
+            "description": "Execute dev tasks concurrently with asyncio.",
+            "dependencies": [
+              2,
+              3
+            ],
+            "details": "Use `asyncio.gather()` in `dev_dispatch_node` to spawn `DevAgentManager.execute_task()` for all unblocked tasks, capture results, and update active/completed/blocked state accordingly.",
+            "status": "pending",
+            "testStrategy": "Mock `execute_task` coroutines and assert parallel invocation and aggregation.",
+            "parentId": "undefined"
+          },
+          {
+            "id": 5,
+            "title": "Enable checkpointing and recovery in graph compilation",
+            "description": "Compile StateGraph with checkpointing enabled.",
+            "dependencies": [
+              2,
+              3
+            ],
+            "details": "Configure LangGraph compilation with checkpointing so runs can resume from saved state, integrating with persistence layer and ensuring graph builds once per orchestrator instance.",
+            "status": "pending",
+            "testStrategy": "Mock checkpoint backend or use temp file and verify recovery from saved state.",
+            "parentId": "undefined"
+          },
+          {
+            "id": 6,
+            "title": "Write orchestration graph tests with mocked agents",
+            "description": "Add integration-style tests for state transitions.",
+            "dependencies": [
+              1,
+              2,
+              3,
+              4,
+              5
+            ],
+            "details": "Create tests covering pm->task->dev->qa flow, clarification loop, failed QA returning to dev, and end condition when all tasks complete; verify state persistence file updates at each node.",
+            "status": "pending",
+            "testStrategy": "Pytest with mocked agents and temporary filesystem for state.json.",
+            "parentId": "undefined"
+          }
+        ],
+        "complexity": 9,
+        "recommendedSubtasks": 6,
+        "expansionPrompt": "Break into state schema/persistence, node implementations, conditional routing, concurrency handling with asyncio, checkpointing, and graph tests with mocked agents.",
+        "updatedAt": "2026-02-26T02:49:37.964Z"
+      },
+      {
+        "id": "9",
+        "title": "Implement GlobalArchitecture summary system and prompt templates",
+        "description": "Create prompt templates for all agents and implement GlobalArchitecture tracking to prevent Dev Agent context starvation and duplicate code",
+        "details": "In app_factory/prompts/:\n- Create pm_prd_expansion.txt:\n  * System prompt: \"You are an expert Product Manager. Analyze the user's input and expand it into a comprehensive PRD...\"\n  * Includes sections: Objective, Core Requirements, Technical Architecture, Success Criteria, Non-Functional Requirements\n- Create pm_clarification.txt:\n  * System prompt: \"You are resolving a clarification request from a downstream agent. Use the existing PRD context to answer if possible...\"\n- Create dev_task_execution.txt:\n  * System prompt: \"You are a Dev Agent. Implement ONLY the specified task. Do not make extraneous changes. Read the Global Architecture to avoid duplicating existing code...\"\n  * Includes: Task details, Global Architecture summary, Test requirements, Output format\n- Create qa_review.txt:\n  * System prompt: \"You are a QA reviewer. Check for: security vulnerabilities (OWASP Top 10), code quality issues, adherence to task requirements, potential bugs...\"\n  * Includes: Task requirements, Diff output, Architecture constraints\n\nImplement GlobalArchitecture tracking:\n- In app_factory/core/architecture_tracker.py:\n- Create ArchitectureTracker class:\n  * Maintains app_factory/data/global_architecture.json\n  * Fields: modules (list of module names/purposes), utilities (shared functions), design_patterns, naming_conventions, tech_stack\n- Implement update_architecture(completed_task: dict, files_changed: List[str]):\n  * Called by QA Agent after successful merge\n  * Analyzes merged code to extract new modules, utilities\n  * Uses Claude API to summarize architectural additions\n  * Appends to global_architecture.json\n- Implement get_architecture_summary() -> str:\n  * Returns concise text summary (max 2000 tokens) for Dev Agent context\n  * Includes: Project structure overview, Existing modules, Shared utilities, Coding conventions\n\nClass signature:\nclass ArchitectureTracker:\n    def __init__(self, data_dir: str, api_key: str)\n    async def update_architecture(self, completed_task: dict, files_changed: List[str])\n    def get_architecture_summary(self) -> str",
+        "testStrategy": "Verify all prompt templates exist and have valid structure. Test ArchitectureTracker creates global_architecture.json. Test update_architecture appends new modules correctly. Test get_architecture_summary stays under token limit. Mock Claude API for architecture analysis. Test prompt templates render correctly with sample data.",
+        "priority": "medium",
+        "dependencies": [
+          "5",
+          "6",
+          "7"
+        ],
+        "status": "done",
+        "subtasks": [
+          {
+            "id": 1,
+            "title": "Create agent prompt template files",
+            "description": "Add all required prompt templates with specified sections and system text.",
+            "dependencies": [],
+            "details": "Create the four files under `app_factory/prompts/` with the exact system prompts and required sections for PRD expansion, clarification, dev execution, and QA review.",
+            "status": "pending",
+            "testStrategy": "Verify files exist and include required sections.",
+            "parentId": "undefined"
+          },
+          {
+            "id": 2,
+            "title": "Define GlobalArchitecture storage schema",
+            "description": "Implement persistent JSON schema and loader/saver logic.",
+            "dependencies": [],
+            "details": "Add schema fields (modules, utilities, design_patterns, naming_conventions, tech_stack) and ensure `app_factory/data/global_architecture.json` is created/updated safely.",
+            "status": "pending",
+            "testStrategy": "Unit test load/save and default initialization.",
+            "parentId": "undefined"
+          },
+          {
+            "id": 3,
+            "title": "Implement ArchitectureTracker update workflow",
+            "description": "Add async update_architecture with Claude-based summarization.",
+            "dependencies": [
+              2
+            ],
+            "details": "Build `ArchitectureTracker` in `app_factory/core/architecture_tracker.py`, extract new modules/utilities from changed files, call Claude API to summarize, and append updates to the JSON store.",
+            "status": "pending",
+            "testStrategy": "Mock Claude API and verify JSON updates on sample changes.",
+            "parentId": "undefined"
+          },
+          {
+            "id": 4,
+            "title": "Generate architecture summary for Dev context",
+            "description": "Implement concise summary generation and tests.",
+            "dependencies": [
+              2,
+              3
+            ],
+            "details": "Implement `get_architecture_summary()` to return a <=2000-token summary including structure, modules, utilities, and conventions; add tests for token/length constraints and content coverage.",
+            "status": "pending",
+            "testStrategy": "Unit test summary length and required sections.",
+            "parentId": "undefined"
+          }
+        ],
+        "complexity": 6,
+        "recommendedSubtasks": 4,
+        "expansionPrompt": "Split into prompt template creation, architecture tracker storage schema, Claude-based summarization updates, and summary generation tests.",
+        "updatedAt": "2026-02-26T02:48:59.828Z"
+      },
+      {
+        "id": "10",
+        "title": "Implement main.py entry point, error handling, and end-to-end integration",
+        "description": "Create main.py orchestration script with CLI interface, comprehensive error handling, escalation logic, and end-to-end workflow execution",
+        "details": "In main.py:\n- Create main() async function:\n  * Parse CLI arguments: --prompt (required), --repo-path (default: cwd), --max-concurrent-tasks (default: 5), --debug\n  * Load environment variables from .env (API keys)\n  * Initialize all components:\n    - ObservabilityManager (LangSmith tracing)\n    - WorkspaceManager (git + Docker)\n    - TaskMasterAgent (MCP client)\n    - PMAgent (Claude API)\n    - DevAgentManager (pexpect + Docker)\n    - QAAgent (Claude API + git)\n    - ArchitectureTracker\n  * Build AppFactoryOrchestrator graph\n  * Execute graph with user input: await orchestrator.run(args.prompt)\n  * Handle exceptions at top level:\n    - ClarificationTimeout: Escalate to human after 3 retries\n    - DockerDaemonError: Exit with helpful message\n    - GitError: Exit with git troubleshooting info\n    - MCPConnectionError: Fallback to CLI mode or exit\n  * Print final summary: Tasks completed, total time, token usage, link to LangSmith trace\n- Implement retry/escalation logic:\n  * Track retry counters per task (max 3 per PRD)\n  * After 3 Dev-QA bounces, escalate to PMAgent with detailed failure context\n  * If PM clarification doesn't resolve, escalate to human with full context dump\n- Implement graceful shutdown:\n  * Cleanup all active Docker containers\n  * Remove all git worktrees\n  * Save final state to disk\n- Add signal handlers (SIGINT, SIGTERM) for Ctrl+C cleanup\n- Implement --dry-run mode that validates all dependencies without execution\n- Add verbose logging controlled by --debug flag\n\nCLI usage:\npython main.py --prompt \"Build a video transcription service with Whisper and summarization\" --repo-path /path/to/project --max-concurrent-tasks 3\n\nError handling priorities:\n1. Retry with exponential backoff for transient failures (API rate limits, network)\n2. Escalate to PM for ambiguity/logic issues\n3. Escalate to human for repeated failures (3+ retries)\n4. Fail fast for configuration errors (missing API keys, Docker unavailable)",
+        "testStrategy": "Mock all components. Test CLI argument parsing. Test error handling for each exception type. Verify retry logic increments counters correctly. Test escalation after 3 retries. Test graceful shutdown cleans up containers and worktrees. Test signal handlers (send SIGINT). Test --dry-run validates dependencies. Integration test with full workflow (mocked external APIs). End-to-end test with real TaskMaster and simple prompt.",
+        "priority": "high",
+        "dependencies": [
+          "8",
+          "9"
+        ],
+        "status": "done",
+        "subtasks": [
+          {
+            "id": 1,
+            "title": "Define CLI parsing and environment loading in main.py",
+            "description": "Set up argument parsing and .env loading for the entry point.",
+            "dependencies": [],
+            "details": "Implement argparse for --prompt, --repo-path, --max-concurrent-tasks, --debug, and --dry-run, then load environment variables from .env and validate required API keys early.",
+            "status": "pending",
+            "testStrategy": "Unit test argument parsing and env validation with mocked env vars.",
+            "parentId": "undefined"
+          },
+          {
+            "id": 2,
+            "title": "Wire core components and orchestrator graph",
+            "description": "Initialize all managers and build the orchestration graph.",
+            "dependencies": [
+              1
+            ],
+            "details": "Instantiate ObservabilityManager, WorkspaceManager, TaskMasterAgent, PMAgent, DevAgentManager, QAAgent, and ArchitectureTracker, then construct the AppFactoryOrchestrator graph and invoke run with the prompt.",
+            "status": "pending",
+            "testStrategy": "Mock component constructors and verify orchestrator run is called with prompt.",
+            "parentId": "undefined"
+          },
+          {
+            "id": 3,
+            "title": "Implement retry and escalation control flow",
+            "description": "Add retry counters, backoff, and escalation logic.",
+            "dependencies": [
+              2
+            ],
+            "details": "Track per-task retry counts, add exponential backoff for transient errors, escalate to PM after 3 Dev-QA bounces, then escalate to human with full context if unresolved.",
+            "status": "pending",
+            "testStrategy": "Unit test retry increments and escalation triggers after max attempts.",
+            "parentId": "undefined"
+          },
+          {
+            "id": 4,
+            "title": "Add top-level error handling and summaries",
+            "description": "Handle exceptions and print final summary output.",
+            "dependencies": [
+              2,
+              3
+            ],
+            "details": "Catch ClarificationTimeout, DockerDaemonError, GitError, and MCPConnectionError with appropriate fallback or exit messaging, then output task completion stats, timing, token usage, and trace link.",
+            "status": "pending",
+            "testStrategy": "Simulate each exception type and assert correct handling and summary output.",
+            "parentId": "undefined"
+          },
+          {
+            "id": 5,
+            "title": "Implement graceful shutdown, signals, and dry-run checks",
+            "description": "Ensure cleanup and validation paths are covered.",
+            "dependencies": [
+              2
+            ],
+            "details": "Add SIGINT/SIGTERM handlers, ensure cleanup of containers/worktrees and state persistence, and implement --dry-run validation that checks dependencies without executing the graph.",
+            "status": "pending",
+            "testStrategy": "Test signal handling with simulated SIGINT and verify cleanup methods called; test dry-run validation path.",
+            "parentId": "undefined"
+          }
+        ],
+        "complexity": 8,
+        "recommendedSubtasks": 5,
+        "expansionPrompt": "Break into CLI parsing/env loading, component wiring, retry/escalation logic, graceful shutdown/signal handling, and dry-run/validation tests.",
+        "updatedAt": "2026-02-26T02:53:15.324Z"
+      }
+    ],
+    "metadata": {
+      "version": "1.0.0",
+      "lastModified": "2026-02-26T02:53:15.329Z",
+      "taskCount": 10,
+      "completedCount": 10,
+      "tags": [
+        "master"
+      ]
+    }
+  }
+}
--- a/.taskmaster/templates/example_prd.txt
+++ b/.taskmaster/templates/example_prd.txt
@@ -0,0 +1,47 @@
+<context>
+# Overview  
+[Provide a high-level overview of your product here. Explain what problem it solves, who it's for, and why it's valuable.]
+
+# Core Features  
+[List and describe the main features of your product. For each feature, include:
+- What it does
+- Why it's important
+- How it works at a high level]
+
+# User Experience  
+[Describe the user journey and experience. Include:
+- User personas
+- Key user flows
+- UI/UX considerations]
+</context>
+<PRD>
+# Technical Architecture  
+[Outline the technical implementation details:
+- System components
+- Data models
+- APIs and integrations
+- Infrastructure requirements]
+
+# Development Roadmap  
+[Break down the development process into phases:
+- MVP requirements
+- Future enhancements
+- Do not think about timelines whatsoever -- all that matters is scope and detailing exactly what needs to be build in each phase so it can later be cut up into tasks]
+
+# Logical Dependency Chain
+[Define the logical order of development:
+- Which features need to be built first (foundation)
+- Getting as quickly as possible to something usable/visible front end that works
+- Properly pacing and scoping each feature so it is atomic but can also be built upon and improved as development approaches]
+
+# Risks and Mitigations  
+[Identify potential risks and how they'll be addressed:
+- Technical challenges
+- Figuring out the MVP that we can build upon
+- Resource constraints]
+
+# Appendix  
+[Include any additional information:
+- Research findings
+- Technical specifications]
+</PRD>
--- a/.taskmaster/templates/example_prd_rpg.txt
+++ b/.taskmaster/templates/example_prd_rpg.txt
@@ -0,0 +1,511 @@
+<rpg-method>
+# Repository Planning Graph (RPG) Method - PRD Template
+
+This template teaches you (AI or human) how to create structured, dependency-aware PRDs using the RPG methodology from Microsoft Research. The key insight: separate WHAT (functional) from HOW (structural), then connect them with explicit dependencies.
+
+## Core Principles
+
+1. **Dual-Semantics**: Think functional (capabilities) AND structural (code organization) separately, then map them
+2. **Explicit Dependencies**: Never assume - always state what depends on what
+3. **Topological Order**: Build foundation first, then layers on top
+4. **Progressive Refinement**: Start broad, refine iteratively
+
+## How to Use This Template
+
+- Follow the instructions in each `<instruction>` block
+- Look at `<example>` blocks to see good vs bad patterns
+- Fill in the content sections with your project details
+- The AI reading this will learn the RPG method by following along
+- Task Master will parse the resulting PRD into dependency-aware tasks
+
+## Recommended Tools for Creating PRDs
+
+When using this template to **create** a PRD (not parse it), use **code-context-aware AI assistants** for best results:
+
+**Why?** The AI needs to understand your existing codebase to make good architectural decisions about modules, dependencies, and integration points.
+
+**Recommended tools:**
+- **Claude Code** (claude-code CLI) - Best for structured reasoning and large contexts
+- **Cursor/Windsurf** - IDE integration with full codebase context
+- **Gemini CLI** (gemini-cli) - Massive context window for large codebases
+- **Codex/Grok CLI** - Strong code generation with context awareness
+
+**Note:** Once your PRD is created, `task-master parse-prd` works with any configured AI model - it just needs to read the PRD text itself, not your codebase.
+</rpg-method>
+
+---
+
+<overview>
+<instruction>
+Start with the problem, not the solution. Be specific about:
+- What pain point exists?
+- Who experiences it?
+- Why existing solutions don't work?
+- What success looks like (measurable outcomes)?
+
+Keep this section focused - don't jump into implementation details yet.
+</instruction>
+
+## Problem Statement
+[Describe the core problem. Be concrete about user pain points.]
+
+## Target Users
+[Define personas, their workflows, and what they're trying to achieve.]
+
+## Success Metrics
+[Quantifiable outcomes. Examples: "80% task completion via autopilot", "< 5% manual intervention rate"]
+
+</overview>
+
+---
+
+<functional-decomposition>
+<instruction>
+Now think about CAPABILITIES (what the system DOES), not code structure yet.
+
+Step 1: Identify high-level capability domains
+- Think: "What major things does this system do?"
+- Examples: Data Management, Core Processing, Presentation Layer
+
+Step 2: For each capability, enumerate specific features
+- Use explore-exploit strategy:
+  * Exploit: What features are REQUIRED for core value?
+  * Explore: What features make this domain COMPLETE?
+
+Step 3: For each feature, define:
+- Description: What it does in one sentence
+- Inputs: What data/context it needs
+- Outputs: What it produces/returns
+- Behavior: Key logic or transformations
+
+<example type="good">
+Capability: Data Validation
+  Feature: Schema validation
+    - Description: Validate JSON payloads against defined schemas
+    - Inputs: JSON object, schema definition
+    - Outputs: Validation result (pass/fail) + error details
+    - Behavior: Iterate fields, check types, enforce constraints
+
+  Feature: Business rule validation
+    - Description: Apply domain-specific validation rules
+    - Inputs: Validated data object, rule set
+    - Outputs: Boolean + list of violated rules
+    - Behavior: Execute rules sequentially, short-circuit on failure
+</example>
+
+<example type="bad">
+Capability: validation.js
+  (Problem: This is a FILE, not a CAPABILITY. Mixing structure into functional thinking.)
+
+Capability: Validation
+  Feature: Make sure data is good
+  (Problem: Too vague. No inputs/outputs. Not actionable.)
+</example>
+</instruction>
+
+## Capability Tree
+
+### Capability: [Name]
+[Brief description of what this capability domain covers]
+
+#### Feature: [Name]
+- **Description**: [One sentence]
+- **Inputs**: [What it needs]
+- **Outputs**: [What it produces]
+- **Behavior**: [Key logic]
+
+#### Feature: [Name]
+- **Description**:
+- **Inputs**:
+- **Outputs**:
+- **Behavior**:
+
+### Capability: [Name]
+...
+
+</functional-decomposition>
+
+---
+
+<structural-decomposition>
+<instruction>
+NOW think about code organization. Map capabilities to actual file/folder structure.
+
+Rules:
+1. Each capability maps to a module (folder or file)
+2. Features within a capability map to functions/classes
+3. Use clear module boundaries - each module has ONE responsibility
+4. Define what each module exports (public interface)
+
+The goal: Create a clear mapping between "what it does" (functional) and "where it lives" (structural).
+
+<example type="good">
+Capability: Data Validation
+  → Maps to: src/validation/
+    ├── schema-validator.js      (Schema validation feature)
+    ├── rule-validator.js         (Business rule validation feature)
+    └── index.js                  (Public exports)
+
+Exports:
+  - validateSchema(data, schema)
+  - validateRules(data, rules)
+</example>
+
+<example type="bad">
+Capability: Data Validation
+  → Maps to: src/utils.js
+  (Problem: "utils" is not a clear module boundary. Where do I find validation logic?)
+
+Capability: Data Validation
+  → Maps to: src/validation/everything.js
+  (Problem: One giant file. Features should map to separate files for maintainability.)
+</example>
+</instruction>
+
+## Repository Structure
+
+```
+project-root/
+├── src/
+│   ├── [module-name]/       # Maps to: [Capability Name]
+│   │   ├── [file].js        # Maps to: [Feature Name]
+│   │   └── index.js         # Public exports
+│   └── [module-name]/
+├── tests/
+└── docs/
+```
+
+## Module Definitions
+
+### Module: [Name]
+- **Maps to capability**: [Capability from functional decomposition]
+- **Responsibility**: [Single clear purpose]
+- **File structure**:
+  ```
+  module-name/
+  ├── feature1.js
+  ├── feature2.js
+  └── index.js
+  ```
+- **Exports**:
+  - `functionName()` - [what it does]
+  - `ClassName` - [what it does]
+
+</structural-decomposition>
+
+---
+
+<dependency-graph>
+<instruction>
+This is THE CRITICAL SECTION for Task Master parsing.
+
+Define explicit dependencies between modules. This creates the topological order for task execution.
+
+Rules:
+1. List modules in dependency order (foundation first)
+2. For each module, state what it depends on
+3. Foundation modules should have NO dependencies
+4. Every non-foundation module should depend on at least one other module
+5. Think: "What must EXIST before I can build this module?"
+
+<example type="good">
+Foundation Layer (no dependencies):
+  - error-handling: No dependencies
+  - config-manager: No dependencies
+  - base-types: No dependencies
+
+Data Layer:
+  - schema-validator: Depends on [base-types, error-handling]
+  - data-ingestion: Depends on [schema-validator, config-manager]
+
+Core Layer:
+  - algorithm-engine: Depends on [base-types, error-handling]
+  - pipeline-orchestrator: Depends on [algorithm-engine, data-ingestion]
+</example>
+
+<example type="bad">
+- validation: Depends on API
+- API: Depends on validation
+(Problem: Circular dependency. This will cause build/runtime issues.)
+
+- user-auth: Depends on everything
+(Problem: Too many dependencies. Should be more focused.)
+</example>
+</instruction>
+
+## Dependency Chain
+
+### Foundation Layer (Phase 0)
+No dependencies - these are built first.
+
+- **[Module Name]**: [What it provides]
+- **[Module Name]**: [What it provides]
+
+### [Layer Name] (Phase 1)
+- **[Module Name]**: Depends on [[module-from-phase-0], [module-from-phase-0]]
+- **[Module Name]**: Depends on [[module-from-phase-0]]
+
+### [Layer Name] (Phase 2)
+- **[Module Name]**: Depends on [[module-from-phase-1], [module-from-foundation]]
+
+[Continue building up layers...]
+
+</dependency-graph>
+
+---
+
+<implementation-roadmap>
+<instruction>
+Turn the dependency graph into concrete development phases.
+
+Each phase should:
+1. Have clear entry criteria (what must exist before starting)
+2. Contain tasks that can be parallelized (no inter-dependencies within phase)
+3. Have clear exit criteria (how do we know phase is complete?)
+4. Build toward something USABLE (not just infrastructure)
+
+Phase ordering follows topological sort of dependency graph.
+
+<example type="good">
+Phase 0: Foundation
+  Entry: Clean repository
+  Tasks:
+    - Implement error handling utilities
+    - Create base type definitions
+    - Setup configuration system
+  Exit: Other modules can import foundation without errors
+
+Phase 1: Data Layer
+  Entry: Phase 0 complete
+  Tasks:
+    - Implement schema validator (uses: base types, error handling)
+    - Build data ingestion pipeline (uses: validator, config)
+  Exit: End-to-end data flow from input to validated output
+</example>
+
+<example type="bad">
+Phase 1: Build Everything
+  Tasks:
+    - API
+    - Database
+    - UI
+    - Tests
+  (Problem: No clear focus. Too broad. Dependencies not considered.)
+</example>
+</instruction>
+
+## Development Phases
+
+### Phase 0: [Foundation Name]
+**Goal**: [What foundational capability this establishes]
+
+**Entry Criteria**: [What must be true before starting]
+
+**Tasks**:
+- [ ] [Task name] (depends on: [none or list])
+  - Acceptance criteria: [How we know it's done]
+  - Test strategy: [What tests prove it works]
+
+- [ ] [Task name] (depends on: [none or list])
+
+**Exit Criteria**: [Observable outcome that proves phase complete]
+
+**Delivers**: [What can users/developers do after this phase?]
+
+---
+
+### Phase 1: [Layer Name]
+**Goal**:
+
+**Entry Criteria**: Phase 0 complete
+
+**Tasks**:
+- [ ] [Task name] (depends on: [[tasks-from-phase-0]])
+- [ ] [Task name] (depends on: [[tasks-from-phase-0]])
+
+**Exit Criteria**:
+
+**Delivers**:
+
+---
+
+[Continue with more phases...]
+
+</implementation-roadmap>
+
+---
+
+<test-strategy>
+<instruction>
+Define how testing will be integrated throughout development (TDD approach).
+
+Specify:
+1. Test pyramid ratios (unit vs integration vs e2e)
+2. Coverage requirements
+3. Critical test scenarios
+4. Test generation guidelines for Surgical Test Generator
+
+This section guides the AI when generating tests during the RED phase of TDD.
+
+<example type="good">
+Critical Test Scenarios for Data Validation module:
+  - Happy path: Valid data passes all checks
+  - Edge cases: Empty strings, null values, boundary numbers
+  - Error cases: Invalid types, missing required fields
+  - Integration: Validator works with ingestion pipeline
+</example>
+</instruction>
+
+## Test Pyramid
+
+```
+        /\
+       /E2E\       ← [X]% (End-to-end, slow, comprehensive)
+      /------\
+     /Integration\ ← [Y]% (Module interactions)
+    /------------\
+   /  Unit Tests  \ ← [Z]% (Fast, isolated, deterministic)
+  /----------------\
+```
+
+## Coverage Requirements
+- Line coverage: [X]% minimum
+- Branch coverage: [X]% minimum
+- Function coverage: [X]% minimum
+- Statement coverage: [X]% minimum
+
+## Critical Test Scenarios
+
+### [Module/Feature Name]
+**Happy path**:
+- [Scenario description]
+- Expected: [What should happen]
+
+**Edge cases**:
+- [Scenario description]
+- Expected: [What should happen]
+
+**Error cases**:
+- [Scenario description]
+- Expected: [How system handles failure]
+
+**Integration points**:
+- [What interactions to test]
+- Expected: [End-to-end behavior]
+
+## Test Generation Guidelines
+[Specific instructions for Surgical Test Generator about what to focus on, what patterns to follow, project-specific test conventions]
+
+</test-strategy>
+
+---
+
+<architecture>
+<instruction>
+Describe technical architecture, data models, and key design decisions.
+
+Keep this section AFTER functional/structural decomposition - implementation details come after understanding structure.
+</instruction>
+
+## System Components
+[Major architectural pieces and their responsibilities]
+
+## Data Models
+[Core data structures, schemas, database design]
+
+## Technology Stack
+[Languages, frameworks, key libraries]
+
+**Decision: [Technology/Pattern]**
+- **Rationale**: [Why chosen]
+- **Trade-offs**: [What we're giving up]
+- **Alternatives considered**: [What else we looked at]
+
+</architecture>
+
+---
+
+<risks>
+<instruction>
+Identify risks that could derail development and how to mitigate them.
+
+Categories:
+- Technical risks (complexity, unknowns)
+- Dependency risks (blocking issues)
+- Scope risks (creep, underestimation)
+</instruction>
+
+## Technical Risks
+**Risk**: [Description]
+- **Impact**: [High/Medium/Low - effect on project]
+- **Likelihood**: [High/Medium/Low]
+- **Mitigation**: [How to address]
+- **Fallback**: [Plan B if mitigation fails]
+
+## Dependency Risks
+[External dependencies, blocking issues]
+
+## Scope Risks
+[Scope creep, underestimation, unclear requirements]
+
+</risks>
+
+---
+
+<appendix>
+## References
+[Papers, documentation, similar systems]
+
+## Glossary
+[Domain-specific terms]
+
+## Open Questions
+[Things to resolve during development]
+</appendix>
+
+---
+
+<task-master-integration>
+# How Task Master Uses This PRD
+
+When you run `task-master parse-prd <file>.txt`, the parser:
+
+1. **Extracts capabilities** → Main tasks
+   - Each `### Capability:` becomes a top-level task
+
+2. **Extracts features** → Subtasks
+   - Each `#### Feature:` becomes a subtask under its capability
+
+3. **Parses dependencies** → Task dependencies
+   - `Depends on: [X, Y]` sets task.dependencies = ["X", "Y"]
+
+4. **Orders by phases** → Task priorities
+   - Phase 0 tasks = highest priority
+   - Phase N tasks = lower priority, properly sequenced
+
+5. **Uses test strategy** → Test generation context
+   - Feeds test scenarios to Surgical Test Generator during implementation
+
+**Result**: A dependency-aware task graph that can be executed in topological order.
+
+## Why RPG Structure Matters
+
+Traditional flat PRDs lead to:
+- ❌ Unclear task dependencies
+- ❌ Arbitrary task ordering
+- ❌ Circular dependencies discovered late
+- ❌ Poorly scoped tasks
+
+RPG-structured PRDs provide:
+- ✅ Explicit dependency chains
+- ✅ Topological execution order
+- ✅ Clear module boundaries
+- ✅ Validated task graph before implementation
+
+## Tips for Best Results
+
+1. **Spend time on dependency graph** - This is the most valuable section for Task Master
+2. **Keep features atomic** - Each feature should be independently testable
+3. **Progressive refinement** - Start broad, use `task-master expand` to break down complex tasks
+4. **Use research mode** - `task-master parse-prd --research` leverages AI for better task generation
+</task-master-integration>