{ "master": { "tasks": [ { "id": "1", "title": "Project scaffolding and dependency setup", "description": "Initialize Python project structure with all required directories, configuration files, and install core dependencies (LangGraph, LangSmith, GitPython, docker, pexpect)", "details": "Create the app_factory/ directory structure as specified in PRD:\n- app_factory/agents/ (pm_agent.py, task_agent.py, dev_agent.py, qa_agent.py)\n- app_factory/core/ (graph.py, workspace.py, observability.py)\n- app_factory/prompts/\n- app_factory/data/\n- main.py at root\n- requirements.txt with:\n * langgraph>=0.0.20\n * langsmith>=0.1.0\n * gitpython>=3.1.40\n * docker>=7.0.0\n * pexpect>=4.9.0\n * anthropic>=0.18.0\n * openai>=1.10.0\n * python-dotenv>=1.0.0\n * pydantic>=2.5.0\n * asyncio (built-in)\n\nCreate .env.example for API keys (ANTHROPIC_API_KEY, OPENAI_API_KEY, LANGSMITH_API_KEY). Initialize git if not already done. Create __init__.py files for proper package structure.", "testStrategy": "Verify directory structure matches PRD specification. Run 'pip install -r requirements.txt' successfully. Import all packages without errors. Verify __init__.py files exist in all package directories.", "priority": "high", "dependencies": [], "status": "done", "subtasks": [ { "id": 1, "title": "Create project directories and package skeleton", "description": "Set up the app_factory package folders and initial module files per the PRD.", "dependencies": [], "details": "Create the app_factory/agents, app_factory/core, app_factory/prompts, and app_factory/data directories. Add empty module files for agents (pm_agent.py, task_agent.py, dev_agent.py, qa_agent.py) and core (graph.py, workspace.py, observability.py), plus main.py at repo root. Add __init__.py files in app_factory, app_factory/agents, and app_factory/core to ensure package imports work.", "status": "pending", "testStrategy": null, "parentId": "undefined" }, { "id": 2, "title": "Add dependency and environment configuration files", "description": "Define required dependencies and environment variable templates.", "dependencies": [], "details": "Create or update requirements.txt to include the specified versions for langgraph, langsmith, gitpython, docker, pexpect, anthropic, openai, python-dotenv, pydantic, and note asyncio as built-in. Add or update .env.example with ANTHROPIC_API_KEY, OPENAI_API_KEY, and LANGSMITH_API_KEY placeholders.", "status": "pending", "testStrategy": null, "parentId": "undefined" }, { "id": 3, "title": "Initialize git and verify scaffolding integrity", "description": "Ensure repository is initialized and structure is verifiable.", "dependencies": [ 1, 2 ], "details": "Initialize git if .git does not exist, and verify that directory structure matches the PRD and that __init__.py files are present. Perform a basic import check for package modules after installing requirements with pip to confirm dependencies resolve correctly.", "status": "pending", "testStrategy": "Run `pip install -r requirements.txt`, then use a small Python import check for app_factory modules and core dependencies.", "parentId": "undefined" } ], "complexity": 3, "recommendedSubtasks": 3, "expansionPrompt": "Break down scaffolding into directory/package creation, requirements/.env.example updates, and verification steps (imports + pip install).", "updatedAt": "2026-02-26T02:33:35.564Z" }, { "id": "2", "title": "Implement LangSmith observability and logging infrastructure", "description": "Build the observability.py module with LangSmith tracing integration and structured Python logging for tracking agent decisions, token usage, and execution flow", "details": "In app_factory/core/observability.py:\n- Create ObservabilityManager class that wraps LangSmith client\n- Implement trace_agent_execution() decorator for tracking agent calls with context (agent_name, task_id, input, output, token_count)\n- Implement structured logging with log levels (DEBUG, INFO, WARNING, ERROR)\n- Create methods: start_trace(), end_trace(), log_state_transition(), log_token_usage(), log_error()\n- Configure LangSmith project name from env var (LANGSMITH_PROJECT)\n- Add async context manager support for automatic trace lifecycle\n- Integrate Python's logging module with custom formatters including timestamps, agent names, and task IDs\n\nKey methods:\nclass ObservabilityManager:\n def __init__(self, project_name: str)\n async def trace_agent(self, agent_name: str, task_id: str, func: Callable)\n def log_state(self, state: dict)\n def get_metrics(self) -> dict", "testStrategy": "Unit tests for ObservabilityManager initialization. Mock LangSmith client and verify trace creation/completion. Test logging output format. Verify async context manager properly starts/ends traces. Test error logging captures stack traces.", "priority": "high", "dependencies": [ "1" ], "status": "done", "subtasks": [ { "id": 1, "title": "Create ObservabilityManager and LangSmith client wrapper", "description": "Define the core class and LangSmith client integration layer.", "dependencies": [], "details": "Implement `ObservabilityManager` initialization with project name lookup from `LANGSMITH_PROJECT`, store LangSmith client, and expose base methods like `start_trace`, `end_trace`, `log_state_transition`, `log_token_usage`, and `log_error` that call into the client as needed.", "status": "pending", "testStrategy": null, "parentId": "undefined" }, { "id": 2, "title": "Add tracing decorator and async context manager support", "description": "Provide trace lifecycle utilities for agent execution.", "dependencies": [ 1 ], "details": "Implement `trace_agent_execution` decorator and `trace_agent` async helper to capture agent name, task id, inputs/outputs, token counts, and errors; add async context manager support to automatically start/end traces around agent runs.", "status": "pending", "testStrategy": null, "parentId": "undefined" }, { "id": 3, "title": "Configure structured Python logging for observability", "description": "Set up standardized logging outputs for agent tracing.", "dependencies": [ 1 ], "details": "Integrate `logging` module with custom formatter including timestamps, agent name, and task id; ensure log levels DEBUG/INFO/WARNING/ERROR map to structured fields and are used by `ObservabilityManager` methods.", "status": "pending", "testStrategy": null, "parentId": "undefined" }, { "id": 4, "title": "Write unit tests with LangSmith mocks and log assertions", "description": "Validate observability behavior through isolated tests.", "dependencies": [ 1, 2, 3 ], "details": "Add tests for ObservabilityManager initialization, trace start/end, decorator behavior, async context lifecycle, error logging with stack traces, token usage logging, and log format validation using mocked LangSmith client and log capture utilities.", "status": "pending", "testStrategy": "Run unit tests with mocks to validate trace creation/completion, logging output, and error capture.", "parentId": "undefined" } ], "complexity": 6, "recommendedSubtasks": 4, "expansionPrompt": "Split into LangSmith client wrapper, tracing decorator/context manager, structured logging configuration, and unit tests with mocks.", "updatedAt": "2026-02-26T02:38:01.502Z" }, { "id": "3", "title": "Implement WorkspaceManager for Git worktree and Docker isolation", "description": "Build workspace.py to manage git worktree creation and Docker container provisioning for isolated Dev Agent execution environments", "details": "In app_factory/core/workspace.py:\n- Use GitPython for git operations and docker Python SDK for container management\n- Implement create_worktree(task_id: str, base_branch: str = 'main') -> str:\n * Validates base_branch exists\n * Creates worktree at ../worktrees/{task_id} using git.worktree_add()\n * Creates branch name: feature/task-{task_id}\n * Returns absolute worktree path\n- Implement spin_up_clean_room(worktree_path: str, task_id: str) -> docker.Container:\n * Pulls base image (python:3.11-slim or custom image with Claude Code installed)\n * Mounts worktree_path as /workspace in container (read/write)\n * Sets working directory to /workspace\n * Configures network isolation (no internet by default, allowlist only necessary domains)\n * Returns container object with metadata (container_id, task_id)\n- Implement cleanup_workspace(task_id: str, container: docker.Container):\n * Stops and removes Docker container\n * Removes git worktree using git.worktree_remove()\n- Add error handling for git conflicts, Docker daemon unavailable, disk space issues\n- Implement get_active_workspaces() -> List[dict] to track all active containers\n\nClass signature:\nclass WorkspaceManager:\n def __init__(self, repo_path: str, docker_image: str)\n async def create_worktree(self, task_id: str) -> str\n async def spin_up_clean_room(self, worktree_path: str, task_id: str) -> Container\n async def cleanup_workspace(self, task_id: str, container: Container)", "testStrategy": "Integration tests with actual git repo (create test repo). Verify worktree creation in correct location. Mock Docker SDK and verify container creation with correct mount points. Test cleanup removes worktree and container. Test error handling when git or Docker fails. Verify concurrent worktree creation for multiple tasks.", "priority": "high", "dependencies": [ "1", "2" ], "status": "done", "subtasks": [ { "id": 1, "title": "Implement Git worktree lifecycle management", "description": "Build create_worktree to validate branches and create feature worktrees.", "dependencies": [], "details": "Use GitPython to open repo_path, verify base_branch exists, create branch feature/task-{task_id}, add worktree at ../worktrees/{task_id}, and return absolute path.", "status": "pending", "testStrategy": "Integration test with temp git repo verifying worktree path and branch creation.", "parentId": "undefined" }, { "id": 2, "title": "Implement Docker clean-room provisioning", "description": "Add spin_up_clean_room to create isolated containers for worktrees.", "dependencies": [ 1 ], "details": "Use docker SDK to pull the configured image, mount worktree_path at /workspace, set working dir, configure network isolation/allowlist, and return container metadata including task_id.", "status": "pending", "testStrategy": "Mock docker SDK to assert image pull, mount bindings, and container settings.", "parentId": "undefined" }, { "id": 3, "title": "Add workspace cleanup and error handling", "description": "Ensure containers and worktrees are removed safely on cleanup.", "dependencies": [ 2 ], "details": "Implement cleanup_workspace to stop/remove container, remove git worktree, and handle errors like conflicts, daemon unavailable, and disk space issues with clear exceptions.", "status": "pending", "testStrategy": "Unit tests with mocks to confirm cleanup calls and error propagation.", "parentId": "undefined" }, { "id": 4, "title": "Expose async API surface and workspace tracking", "description": "Make class methods async and track active workspaces.", "dependencies": [ 1, 2, 3 ], "details": "Define WorkspaceManager __init__ and async methods, add get_active_workspaces returning container/task metadata, and maintain internal registry.", "status": "pending", "testStrategy": "Async unit tests verifying registry updates on create/spinup/cleanup.", "parentId": "undefined" }, { "id": 5, "title": "Create integration tests with Git/Docker mocks", "description": "Add integration-style tests that cover full workflow.", "dependencies": [ 1, 2, 3, 4 ], "details": "Set up test repo, validate worktree creation location, mock Docker for container creation, and verify cleanup removes resources and handles failures.", "status": "pending", "testStrategy": "Pytest suite using temp dirs, GitPython real repo, and Docker SDK mocks.", "parentId": "undefined" } ], "complexity": 8, "recommendedSubtasks": 5, "expansionPrompt": "Divide into Git worktree lifecycle, Docker container lifecycle, error handling/cleanup, async API surface, and integration tests with mocks.", "updatedAt": "2026-02-26T02:40:47.441Z" }, { "id": "4", "title": "Implement TaskMasterAgent for claude-task-master integration", "description": "Build task_agent.py to interface with claude-task-master via MCP client for task graph management, dependency resolution, and status tracking", "details": "In app_factory/agents/task_agent.py:\n- Create TaskMasterAgent class that interfaces with claude-task-master as MCP client\n- Implement parse_prd(prd_content: str, num_tasks: int = 10) -> dict:\n * Writes PRD to .taskmaster/docs/prd.md\n * Calls task-master parse-prd via MCP tool mcp__task_master_ai__parse_prd\n * Returns parsed task structure with IDs, dependencies, priorities\n- Implement get_unblocked_tasks() -> List[dict]:\n * Calls mcp__task_master_ai__get_tasks with status filter\n * Returns tasks where all dependencies are 'done' and status is 'pending'\n- Implement update_task_status(task_id: str, status: str, notes: str = ''):\n * Calls mcp__task_master_ai__set_task_status\n * Optionally calls mcp__task_master_ai__update_subtask for notes\n- Implement get_task_details(task_id: str) -> dict:\n * Calls mcp__task_master_ai__get_task\n * Returns full task object with title, description, details, testStrategy\n- Implement get_next_task() -> dict:\n * Calls mcp__task_master_ai__next_task\n * Returns highest priority unblocked task\n- Add retry logic with exponential backoff for MCP calls\n- Implement expand_task(task_id: str, num_subtasks: int = 5) for breaking down complex tasks\n\nClass signature:\nclass TaskMasterAgent:\n def __init__(self, project_root: str, mcp_client: MCPClient)\n async def parse_prd(self, prd_content: str) -> dict\n async def get_unblocked_tasks(self) -> List[dict]\n async def update_task_status(self, task_id: str, status: str)\n async def get_task_details(self, task_id: str) -> dict", "testStrategy": "Mock MCP client and verify correct tool calls with expected parameters. Test parse_prd creates .taskmaster/docs/prd.md correctly. Unit test get_unblocked_tasks filters dependencies correctly. Test update_task_status handles both main tasks and subtasks. Verify retry logic triggers on MCP failures. Integration test with real claude-task-master installation.", "priority": "high", "dependencies": [ "1", "2" ], "status": "done", "subtasks": [ { "id": 1, "title": "Design TaskMasterAgent MCP wrapper with retries", "description": "Define the TaskMasterAgent class structure and MCP call wrapper with exponential backoff.", "dependencies": [], "details": "Implement an internal async MCP call helper that handles retries, backoff timing, and error surfacing, and wire it into the agent constructor with project root and client references.", "status": "pending", "testStrategy": "Mock MCP client failures to verify retries and backoff timing.", "parentId": "undefined" }, { "id": 2, "title": "Implement PRD parsing and file output flow", "description": "Add parse_prd to write PRD content and call the MCP parse tool.", "dependencies": [ 1 ], "details": "Create .taskmaster/docs/prd.md from provided content, invoke mcp__task_master_ai__parse_prd with num_tasks, and return parsed task structure with IDs, dependencies, and priorities.", "status": "pending", "testStrategy": "Mock filesystem writes and MCP responses to validate file path and payloads.", "parentId": "undefined" }, { "id": 3, "title": "Implement task query and status APIs", "description": "Add unblocked task retrieval, details lookup, and status updates.", "dependencies": [ 1 ], "details": "Implement get_unblocked_tasks using mcp__task_master_ai__get_tasks and dependency checks, get_task_details via mcp__task_master_ai__get_task, and update_task_status with optional subtask notes.", "status": "pending", "testStrategy": "Mock MCP tools to verify filters, dependency logic, and status/notes calls.", "parentId": "undefined" }, { "id": 4, "title": "Add expand_task and unit tests", "description": "Implement task expansion API and write unit tests for core behaviors.", "dependencies": [ 2, 3 ], "details": "Add expand_task using mcp__task_master_ai__expand_task (or equivalent) and create tests covering parse_prd, unblocked filtering, status updates, and retry logic with mocked MCP tools.", "status": "pending", "testStrategy": "Use pytest with mocked MCP client to validate behaviors and error handling.", "parentId": "undefined" } ], "complexity": 7, "recommendedSubtasks": 4, "expansionPrompt": "Break into MCP client wrapper/retry logic, PRD parsing file writes, task query/status APIs, and tests using mocked MCP tools.", "updatedAt": "2026-02-26T02:40:48.102Z" }, { "id": "5", "title": "Implement PMAgent for PRD generation and clarification handling", "description": "Build pm_agent.py to expand user prompts into structured PRDs and handle clarification requests from downstream agents using Claude 3.7 Sonnet", "details": "In app_factory/agents/pm_agent.py:\n- Create PMAgent class using Anthropic Claude 3.7 Sonnet (claude-3-5-sonnet-20250219 or latest)\n- Implement expand_prompt_to_prd(user_input: str) -> str:\n * Load prompt template from app_factory/prompts/pm_prd_expansion.txt\n * Template should instruct Claude to: analyze input, identify missing requirements, specify tech stack, define success criteria, outline architecture\n * Call Anthropic API with system prompt as PM expert\n * Return structured PRD in markdown format with sections: Objective, Requirements, Architecture, Tech Stack, Success Criteria\n- Implement handle_clarification_request(clarification: dict) -> str:\n * Takes ClarificationRequest object with fields: requesting_agent (dev/qa/task), task_id, question, context\n * Loads prompt template from app_factory/prompts/pm_clarification.txt\n * Either auto-resolves if question is answerable from existing PRD\n * Or escalates to human via input() prompt (blocking operation)\n * Returns clarification response as string\n- Implement update_prd(prd_path: str, updates: str):\n * Appends clarifications/updates to existing PRD file\n * Maintains version history in PRD document\n- Add token usage tracking for cost monitoring\n- Implement async methods for non-blocking execution\n\nClass signature:\nclass PMAgent:\n def __init__(self, api_key: str, model: str = 'claude-3-5-sonnet-20250219')\n async def expand_prompt_to_prd(self, user_input: str) -> str\n async def handle_clarification_request(self, clarification: dict) -> str\n def update_prd(self, prd_path: str, updates: str)", "testStrategy": "Mock Anthropic API and verify system prompt structure. Test expand_prompt_to_prd returns valid markdown PRD. Test handle_clarification_request with auto-resolvable questions (mock no human input). Test escalation to human input. Verify token tracking accuracy. Test PRD update appends correctly without corrupting file.", "priority": "high", "dependencies": [ "1", "2" ], "status": "done", "subtasks": [ { "id": 1, "title": "Scaffold PMAgent class and prompt loading", "description": "Create the PMAgent class structure and load prompt templates from disk.", "dependencies": [], "details": "Add pm_agent.py with PMAgent __init__ signature, helper to read prompt files from app_factory/prompts, and placeholders for async methods to ensure template loading is centralized and reusable.", "status": "pending", "testStrategy": null, "parentId": "undefined" }, { "id": 2, "title": "Implement PRD expansion with Claude and token tracking", "description": "Implement expand_prompt_to_prd using Anthropic API and capture token usage.", "dependencies": [ 1 ], "details": "Call Claude 3.7 Sonnet with a PM system prompt and the pm_prd_expansion template, return markdown PRD sections, and record token usage/cost metadata in the response or internal counters.", "status": "pending", "testStrategy": "Mock Anthropic client; verify system/user prompts, markdown sections, and token counters updated.", "parentId": "undefined" }, { "id": 3, "title": "Handle clarification requests with auto-resolve and escalation", "description": "Implement handle_clarification_request logic for auto-resolution or human input.", "dependencies": [ 1, 2 ], "details": "Load pm_clarification template, answer from existing PRD context when possible, otherwise block on input() for human response, and return the clarification string while tracking token usage for model calls.", "status": "pending", "testStrategy": "Mock clarification inputs and Anthropic calls; cover auto-resolve and human escalation branches.", "parentId": "undefined" }, { "id": 4, "title": "Add PRD update/versioning and async integration tests", "description": "Implement PRD update/version history and add tests for PMAgent flows.", "dependencies": [ 2, 3 ], "details": "Append updates to PRD with version headers/timestamps, ensure async methods are non-blocking, and create tests for PRD appends, clarification updates, and token tracking accuracy.", "status": "pending", "testStrategy": "Use temp files to verify append/version history and async test harness for method behavior.", "parentId": "undefined" } ], "complexity": 7, "recommendedSubtasks": 4, "expansionPrompt": "Split into prompt template loading, Anthropic API integration with token tracking, clarification handling/escalation, and PRD update/versioning tests.", "updatedAt": "2026-02-26T02:40:48.806Z" }, { "id": "6", "title": "Implement DevAgentManager for Claude Code/Codex subprocess automation", "description": "Build dev_agent.py to spawn Dev Agents in Docker containers, interface with Claude Code via pexpect, and execute task implementations with strict context isolation", "details": "In app_factory/agents/dev_agent.py:\n- Create DevAgentManager class that spawns Dev Agents in Docker containers\n- Implement execute_task(task: dict, worktree_path: str, container: Container, global_arch: str) -> dict:\n * Constructs minimized prompt from task details + global_arch summary\n * Loads prompt template from app_factory/prompts/dev_task_execution.txt\n * Template structure:\n - Task ID, title, description, details, testStrategy\n - Global architecture summary (read-only context)\n - Strict instruction: implement only this task, no extraneous changes\n - Must create test file and pass tests before completion\n * Uses pexpect to spawn claude command inside Docker container:\n - pexpect.spawn('docker exec -it {container_id} claude --headless --prompt-file /tmp/task_prompt.txt')\n * Monitors Claude Code stdout/stderr with timeout (30 min default)\n * Parses exit code: 0 = success, non-zero = failure\n * If exit code != 0 or timeout, trigger ClarificationRequest\n * Returns result dict: {status: 'success'|'failed'|'needs_clarification', output: str, files_changed: List[str]}\n- Implement prepare_task_prompt(task: dict, global_arch: str) -> str:\n * Generates minimal context prompt file\n- Implement parse_claude_output(output: str) -> dict:\n * Extracts files changed, test results, error messages\n- Add fallback to OpenAI Codex API for algorithmic generation if Claude Code fails\n- Implement max retry counter (max_retries=3 per PRD requirement)\n- Handle timeout errors and container crashes gracefully\n\nClass signature:\nclass DevAgentManager:\n def __init__(self, docker_client: docker.DockerClient, max_retries: int = 3)\n async def execute_task(self, task: dict, container: Container, global_arch: str) -> dict\n def prepare_task_prompt(self, task: dict, global_arch: str) -> str", "testStrategy": "Mock pexpect.spawn and Docker exec. Test prompt construction includes all required fields. Test timeout triggers after configured duration. Verify max_retries prevents infinite loops. Test exit code parsing (0 vs non-zero). Mock Claude Code failure and verify fallback to Codex. Integration test with real Docker container and mock Claude Code script.", "priority": "high", "dependencies": [ "1", "2", "3" ], "status": "done", "subtasks": [ { "id": 1, "title": "Draft DevAgentManager class skeleton and prompt utilities", "description": "Create the base class and method signatures in dev_agent.py.", "dependencies": [], "details": "Add DevAgentManager __init__ with docker client and max_retries. Stub execute_task, prepare_task_prompt, and parse_claude_output with docstrings and type hints.", "status": "pending", "testStrategy": null, "parentId": "undefined" }, { "id": 2, "title": "Implement prompt template loading and prompt construction", "description": "Build prompt generation from task fields and global architecture summary.", "dependencies": [ 1 ], "details": "Load app_factory/prompts/dev_task_execution.txt, inject task id/title/description/details/testStrategy and global_arch section, and enforce strict instructions in the generated prompt string.", "status": "pending", "testStrategy": null, "parentId": "undefined" }, { "id": 3, "title": "Add Docker exec + pexpect orchestration for Claude Code", "description": "Run claude in container using pexpect with timeout and capture output.", "dependencies": [ 1, 2 ], "details": "Spawn docker exec with claude --headless --prompt-file, monitor stdout/stderr, enforce 30-minute timeout, collect exit code and output, and handle container crashes gracefully.", "status": "pending", "testStrategy": null, "parentId": "undefined" }, { "id": 4, "title": "Parse Claude output and format execute_task result", "description": "Extract changed files, test results, and errors from Claude output.", "dependencies": [ 3 ], "details": "Implement parse_claude_output to derive files_changed/test status/error messages and return execute_task result dict with status, output, and files_changed.", "status": "pending", "testStrategy": null, "parentId": "undefined" }, { "id": 5, "title": "Add retry logic, timeout handling, and Codex fallback", "description": "Implement max retry loop and fallback to OpenAI Codex when Claude fails.", "dependencies": [ 3, 4 ], "details": "Add max_retries loop in execute_task, map non-zero exit/timeout to needs_clarification or failed, and invoke Codex fallback for algorithmic generation when Claude fails.", "status": "pending", "testStrategy": null, "parentId": "undefined" }, { "id": 6, "title": "Create unit tests with mocks for orchestration and parsing", "description": "Add tests for prompt generation, retries, timeout, and fallback behavior.", "dependencies": [ 2, 3, 4, 5 ], "details": "Mock docker client, container exec, and pexpect.spawn; assert prompt contents, timeout triggers, exit code handling, retry cap, and Codex fallback invocation.", "status": "pending", "testStrategy": "Use pytest with mocks for Docker and pexpect; verify parsing and retry behavior.", "parentId": "undefined" } ], "complexity": 9, "recommendedSubtasks": 6, "expansionPrompt": "Break into prompt preparation, Docker exec/pexpect orchestration, output parsing, retry/timeout handling, Codex fallback, and tests with mocks.", "updatedAt": "2026-02-26T02:45:27.367Z" }, { "id": "7", "title": "Implement QAAgent for code review, testing, and merge operations", "description": "Build qa_agent.py to perform static analysis, run tests, handle git rebase/merge conflicts, and merge completed worktrees back to main branch", "details": "In app_factory/agents/qa_agent.py:\n- Create QAAgent class using Claude 3.7 Sonnet for code review reasoning\n- Implement review_and_merge(task_id: str, worktree_path: str) -> dict:\n * Step 1: git rebase main on Dev Agent's feature branch\n - If rebase has conflicts, parse git status output\n - If conflicts are simple (same-file, non-overlapping), auto-resolve\n - If conflicts are complex, return {status: 'rebase_failed', conflicts: [...], action: 'kick_back_to_dev'}\n * Step 2: Run static analysis with ruff or pylint\n - Parse linting output, fail if critical errors found\n * Step 3: Run tests using pytest in worktree\n - Parse test output, extract pass/fail counts\n - If tests fail, extract failure details\n * Step 4: Code review with Claude API\n - Read all changed files (git diff main)\n - Send to Claude with prompt from app_factory/prompts/qa_review.txt\n - Prompt instructs: check for security issues (SQL injection, XSS, command injection), code quality, adherence to task requirements\n - If Claude identifies issues, return {status: 'review_failed', issues: [...], action: 'kick_back_to_dev'}\n * Step 5: Merge to main\n - git checkout main && git merge --no-ff feature/task-{task_id}\n - git push origin main\n * Return {status: 'merged', commit_sha: str}\n- Implement parse_test_results(output: str) -> dict:\n * Extracts pytest output (passed, failed, errors)\n- Implement auto_resolve_conflicts(conflicts: List[str]) -> bool:\n * Attempts git rerere or simple conflict resolution\n- Add retry counter (max 3 attempts per PRD) before escalation\n\nClass signature:\nclass QAAgent:\n def __init__(self, repo_path: str, api_key: str, max_retries: int = 3)\n async def review_and_merge(self, task_id: str, worktree_path: str) -> dict\n def run_tests(self, worktree_path: str) -> dict\n async def code_review(self, diff: str, task: dict) -> dict", "testStrategy": "Mock git operations and verify rebase called correctly. Test conflict detection parses git status. Test static analysis runs ruff/pylint. Mock pytest execution and verify output parsing. Mock Claude API for code review. Test merge operation creates proper commit. Verify max retry counter works. Test kick_back_to_dev returns correct status.", "priority": "high", "dependencies": [ "1", "2", "3" ], "status": "done", "subtasks": [ { "id": 1, "title": "Implement QAAgent skeleton and core workflows", "description": "Create QAAgent class and wire core review_and_merge flow structure.", "dependencies": [], "details": "Add qa_agent.py class definition, init, and review_and_merge orchestration with placeholders for rebase, lint, tests, review, and merge steps.", "status": "pending", "testStrategy": null, "parentId": "undefined" }, { "id": 2, "title": "Build git rebase/merge and conflict handling", "description": "Implement git rebase, conflict parsing, auto-resolve, and merge logic.", "dependencies": [ 1 ], "details": "Add git commands for rebase/merge, parse git status for conflicts, implement auto_resolve_conflicts and retry escalation behavior.", "status": "pending", "testStrategy": null, "parentId": "undefined" }, { "id": 3, "title": "Add linting and test execution with parsing", "description": "Run static analysis and pytest with robust output parsing.", "dependencies": [ 1 ], "details": "Implement run_tests and parse_test_results, wire linting via ruff/pylint, and return structured results for failures.", "status": "pending", "testStrategy": null, "parentId": "undefined" }, { "id": 4, "title": "Integrate Claude review for code quality checks", "description": "Send diffs to Claude and evaluate review feedback.", "dependencies": [ 1 ], "details": "Load prompt template, build diff payload, call Claude API, and return review_failed with issues when needed.", "status": "pending", "testStrategy": null, "parentId": "undefined" }, { "id": 5, "title": "Create tests with mocked git/CLI/Claude", "description": "Add unit tests covering QAAgent behavior and retries.", "dependencies": [ 1, 2, 3, 4 ], "details": "Mock git commands, lint/test outputs, and Claude responses; verify conflict handling, retries, and merge success.", "status": "pending", "testStrategy": null, "parentId": "undefined" } ], "complexity": 8, "recommendedSubtasks": 5, "expansionPrompt": "Split into git rebase/merge workflow, lint/test execution and parsing, Claude review integration, conflict resolution/retry logic, and tests with mocked git/CLI.", "updatedAt": "2026-02-26T02:45:07.395Z" }, { "id": "8", "title": "Implement LangGraph state machine and orchestration graph", "description": "Build graph.py with LangGraph StateGraph defining nodes for PM, Task, Dev, and QA agents, including bi-directional clarification flow and concurrent execution logic", "details": "In app_factory/core/graph.py:\n- Create AppFactoryOrchestrator class wrapping LangGraph StateGraph\n- Define state schema using TypedDict:\n * user_input: str\n * prd: str\n * tasks: List[dict]\n * active_tasks: dict (task_id -> DevAgent state)\n * completed_tasks: List[str]\n * blocked_tasks: dict (task_id -> reason)\n * clarification_requests: List[dict]\n * global_architecture: str\n- Define nodes:\n * pm_node: Calls PMAgent.expand_prompt_to_prd(), updates state.prd and state.global_architecture\n * task_node: Calls TaskMasterAgent.parse_prd() and get_unblocked_tasks(), updates state.tasks\n * dev_dispatch_node: Concurrently spawns DevAgentManager.execute_task() for all unblocked tasks using asyncio.gather()\n * qa_node: Calls QAAgent.review_and_merge() for completed dev tasks\n * clarification_node: Handles ClarificationRequest, routes to PM, updates state\n- Define edges with conditional routing:\n * pm_node -> task_node (always)\n * task_node -> dev_dispatch_node (if unblocked_tasks exist) OR END (if all done)\n * dev_dispatch_node -> qa_node (for completed tasks) OR clarification_node (if clarification needed)\n * qa_node -> task_node (loop back to check newly unblocked tasks) OR dev_dispatch_node (kick back to dev on failure)\n * clarification_node -> dev_dispatch_node (after clarification resolved)\n- Implement should_continue(state: dict) -> str routing function for conditional edges\n- Add concurrent execution support using asyncio for dev_dispatch_node\n- Implement state persistence to disk (app_factory/data/state.json) after each node execution\n- Compile graph with checkpointing enabled for recovery\n\nClass signature:\nclass AppFactoryOrchestrator:\n def __init__(self, pm_agent: PMAgent, task_agent: TaskMasterAgent, dev_manager: DevAgentManager, qa_agent: QAAgent)\n def build_graph(self) -> StateGraph\n async def run(self, user_input: str) -> dict", "testStrategy": "Mock all agent classes. Test state transitions through each node. Verify pm_node -> task_node edge. Test conditional routing in should_continue(). Test concurrent dev_dispatch_node spawns multiple tasks. Verify clarification_node routes back correctly. Test state persistence saves/loads correctly. Integration test with full graph execution (mocked agents).", "priority": "high", "dependencies": [ "4", "5", "6", "7" ], "status": "done", "subtasks": [ { "id": 1, "title": "Define orchestration state schema and persistence", "description": "Create TypedDict state schema and load/save hooks.", "dependencies": [], "details": "Implement TypedDict fields for user_input, prd, tasks, active_tasks, completed_tasks, blocked_tasks, clarification_requests, global_architecture, and add JSON persistence to `app_factory/data/state.json` after each node execution.", "status": "pending", "testStrategy": "Unit test serialize/deserialize with sample state and ensure writes after node execution.", "parentId": "undefined" }, { "id": 2, "title": "Implement AppFactoryOrchestrator skeleton and node functions", "description": "Add class wrapper and core node implementations.", "dependencies": [ 1 ], "details": "Create `AppFactoryOrchestrator` with `build_graph` and `run`, and implement `pm_node`, `task_node`, `dev_dispatch_node`, `qa_node`, `clarification_node` calling the respective agent methods and updating state fields.", "status": "pending", "testStrategy": "Mock agents and assert each node updates state correctly.", "parentId": "undefined" }, { "id": 3, "title": "Add conditional routing and should_continue logic", "description": "Define edge conditions for graph routing.", "dependencies": [ 2 ], "details": "Implement `should_continue(state)` and conditional edges: pm->task, task->dev or END, dev->qa or clarification, qa->task or dev, clarification->dev; encode decisions based on blocked/clarification/completion flags.", "status": "pending", "testStrategy": "Table-driven tests for routing decisions across state scenarios.", "parentId": "undefined" }, { "id": 4, "title": "Add asyncio concurrency for dev dispatch", "description": "Execute dev tasks concurrently with asyncio.", "dependencies": [ 2, 3 ], "details": "Use `asyncio.gather()` in `dev_dispatch_node` to spawn `DevAgentManager.execute_task()` for all unblocked tasks, capture results, and update active/completed/blocked state accordingly.", "status": "pending", "testStrategy": "Mock `execute_task` coroutines and assert parallel invocation and aggregation.", "parentId": "undefined" }, { "id": 5, "title": "Enable checkpointing and recovery in graph compilation", "description": "Compile StateGraph with checkpointing enabled.", "dependencies": [ 2, 3 ], "details": "Configure LangGraph compilation with checkpointing so runs can resume from saved state, integrating with persistence layer and ensuring graph builds once per orchestrator instance.", "status": "pending", "testStrategy": "Mock checkpoint backend or use temp file and verify recovery from saved state.", "parentId": "undefined" }, { "id": 6, "title": "Write orchestration graph tests with mocked agents", "description": "Add integration-style tests for state transitions.", "dependencies": [ 1, 2, 3, 4, 5 ], "details": "Create tests covering pm->task->dev->qa flow, clarification loop, failed QA returning to dev, and end condition when all tasks complete; verify state persistence file updates at each node.", "status": "pending", "testStrategy": "Pytest with mocked agents and temporary filesystem for state.json.", "parentId": "undefined" } ], "complexity": 9, "recommendedSubtasks": 6, "expansionPrompt": "Break into state schema/persistence, node implementations, conditional routing, concurrency handling with asyncio, checkpointing, and graph tests with mocked agents.", "updatedAt": "2026-02-26T02:49:37.964Z" }, { "id": "9", "title": "Implement GlobalArchitecture summary system and prompt templates", "description": "Create prompt templates for all agents and implement GlobalArchitecture tracking to prevent Dev Agent context starvation and duplicate code", "details": "In app_factory/prompts/:\n- Create pm_prd_expansion.txt:\n * System prompt: \"You are an expert Product Manager. Analyze the user's input and expand it into a comprehensive PRD...\"\n * Includes sections: Objective, Core Requirements, Technical Architecture, Success Criteria, Non-Functional Requirements\n- Create pm_clarification.txt:\n * System prompt: \"You are resolving a clarification request from a downstream agent. Use the existing PRD context to answer if possible...\"\n- Create dev_task_execution.txt:\n * System prompt: \"You are a Dev Agent. Implement ONLY the specified task. Do not make extraneous changes. Read the Global Architecture to avoid duplicating existing code...\"\n * Includes: Task details, Global Architecture summary, Test requirements, Output format\n- Create qa_review.txt:\n * System prompt: \"You are a QA reviewer. Check for: security vulnerabilities (OWASP Top 10), code quality issues, adherence to task requirements, potential bugs...\"\n * Includes: Task requirements, Diff output, Architecture constraints\n\nImplement GlobalArchitecture tracking:\n- In app_factory/core/architecture_tracker.py:\n- Create ArchitectureTracker class:\n * Maintains app_factory/data/global_architecture.json\n * Fields: modules (list of module names/purposes), utilities (shared functions), design_patterns, naming_conventions, tech_stack\n- Implement update_architecture(completed_task: dict, files_changed: List[str]):\n * Called by QA Agent after successful merge\n * Analyzes merged code to extract new modules, utilities\n * Uses Claude API to summarize architectural additions\n * Appends to global_architecture.json\n- Implement get_architecture_summary() -> str:\n * Returns concise text summary (max 2000 tokens) for Dev Agent context\n * Includes: Project structure overview, Existing modules, Shared utilities, Coding conventions\n\nClass signature:\nclass ArchitectureTracker:\n def __init__(self, data_dir: str, api_key: str)\n async def update_architecture(self, completed_task: dict, files_changed: List[str])\n def get_architecture_summary(self) -> str", "testStrategy": "Verify all prompt templates exist and have valid structure. Test ArchitectureTracker creates global_architecture.json. Test update_architecture appends new modules correctly. Test get_architecture_summary stays under token limit. Mock Claude API for architecture analysis. Test prompt templates render correctly with sample data.", "priority": "medium", "dependencies": [ "5", "6", "7" ], "status": "done", "subtasks": [ { "id": 1, "title": "Create agent prompt template files", "description": "Add all required prompt templates with specified sections and system text.", "dependencies": [], "details": "Create the four files under `app_factory/prompts/` with the exact system prompts and required sections for PRD expansion, clarification, dev execution, and QA review.", "status": "pending", "testStrategy": "Verify files exist and include required sections.", "parentId": "undefined" }, { "id": 2, "title": "Define GlobalArchitecture storage schema", "description": "Implement persistent JSON schema and loader/saver logic.", "dependencies": [], "details": "Add schema fields (modules, utilities, design_patterns, naming_conventions, tech_stack) and ensure `app_factory/data/global_architecture.json` is created/updated safely.", "status": "pending", "testStrategy": "Unit test load/save and default initialization.", "parentId": "undefined" }, { "id": 3, "title": "Implement ArchitectureTracker update workflow", "description": "Add async update_architecture with Claude-based summarization.", "dependencies": [ 2 ], "details": "Build `ArchitectureTracker` in `app_factory/core/architecture_tracker.py`, extract new modules/utilities from changed files, call Claude API to summarize, and append updates to the JSON store.", "status": "pending", "testStrategy": "Mock Claude API and verify JSON updates on sample changes.", "parentId": "undefined" }, { "id": 4, "title": "Generate architecture summary for Dev context", "description": "Implement concise summary generation and tests.", "dependencies": [ 2, 3 ], "details": "Implement `get_architecture_summary()` to return a <=2000-token summary including structure, modules, utilities, and conventions; add tests for token/length constraints and content coverage.", "status": "pending", "testStrategy": "Unit test summary length and required sections.", "parentId": "undefined" } ], "complexity": 6, "recommendedSubtasks": 4, "expansionPrompt": "Split into prompt template creation, architecture tracker storage schema, Claude-based summarization updates, and summary generation tests.", "updatedAt": "2026-02-26T02:48:59.828Z" }, { "id": "10", "title": "Implement main.py entry point, error handling, and end-to-end integration", "description": "Create main.py orchestration script with CLI interface, comprehensive error handling, escalation logic, and end-to-end workflow execution", "details": "In main.py:\n- Create main() async function:\n * Parse CLI arguments: --prompt (required), --repo-path (default: cwd), --max-concurrent-tasks (default: 5), --debug\n * Load environment variables from .env (API keys)\n * Initialize all components:\n - ObservabilityManager (LangSmith tracing)\n - WorkspaceManager (git + Docker)\n - TaskMasterAgent (MCP client)\n - PMAgent (Claude API)\n - DevAgentManager (pexpect + Docker)\n - QAAgent (Claude API + git)\n - ArchitectureTracker\n * Build AppFactoryOrchestrator graph\n * Execute graph with user input: await orchestrator.run(args.prompt)\n * Handle exceptions at top level:\n - ClarificationTimeout: Escalate to human after 3 retries\n - DockerDaemonError: Exit with helpful message\n - GitError: Exit with git troubleshooting info\n - MCPConnectionError: Fallback to CLI mode or exit\n * Print final summary: Tasks completed, total time, token usage, link to LangSmith trace\n- Implement retry/escalation logic:\n * Track retry counters per task (max 3 per PRD)\n * After 3 Dev-QA bounces, escalate to PMAgent with detailed failure context\n * If PM clarification doesn't resolve, escalate to human with full context dump\n- Implement graceful shutdown:\n * Cleanup all active Docker containers\n * Remove all git worktrees\n * Save final state to disk\n- Add signal handlers (SIGINT, SIGTERM) for Ctrl+C cleanup\n- Implement --dry-run mode that validates all dependencies without execution\n- Add verbose logging controlled by --debug flag\n\nCLI usage:\npython main.py --prompt \"Build a video transcription service with Whisper and summarization\" --repo-path /path/to/project --max-concurrent-tasks 3\n\nError handling priorities:\n1. Retry with exponential backoff for transient failures (API rate limits, network)\n2. Escalate to PM for ambiguity/logic issues\n3. Escalate to human for repeated failures (3+ retries)\n4. Fail fast for configuration errors (missing API keys, Docker unavailable)", "testStrategy": "Mock all components. Test CLI argument parsing. Test error handling for each exception type. Verify retry logic increments counters correctly. Test escalation after 3 retries. Test graceful shutdown cleans up containers and worktrees. Test signal handlers (send SIGINT). Test --dry-run validates dependencies. Integration test with full workflow (mocked external APIs). End-to-end test with real TaskMaster and simple prompt.", "priority": "high", "dependencies": [ "8", "9" ], "status": "done", "subtasks": [ { "id": 1, "title": "Define CLI parsing and environment loading in main.py", "description": "Set up argument parsing and .env loading for the entry point.", "dependencies": [], "details": "Implement argparse for --prompt, --repo-path, --max-concurrent-tasks, --debug, and --dry-run, then load environment variables from .env and validate required API keys early.", "status": "pending", "testStrategy": "Unit test argument parsing and env validation with mocked env vars.", "parentId": "undefined" }, { "id": 2, "title": "Wire core components and orchestrator graph", "description": "Initialize all managers and build the orchestration graph.", "dependencies": [ 1 ], "details": "Instantiate ObservabilityManager, WorkspaceManager, TaskMasterAgent, PMAgent, DevAgentManager, QAAgent, and ArchitectureTracker, then construct the AppFactoryOrchestrator graph and invoke run with the prompt.", "status": "pending", "testStrategy": "Mock component constructors and verify orchestrator run is called with prompt.", "parentId": "undefined" }, { "id": 3, "title": "Implement retry and escalation control flow", "description": "Add retry counters, backoff, and escalation logic.", "dependencies": [ 2 ], "details": "Track per-task retry counts, add exponential backoff for transient errors, escalate to PM after 3 Dev-QA bounces, then escalate to human with full context if unresolved.", "status": "pending", "testStrategy": "Unit test retry increments and escalation triggers after max attempts.", "parentId": "undefined" }, { "id": 4, "title": "Add top-level error handling and summaries", "description": "Handle exceptions and print final summary output.", "dependencies": [ 2, 3 ], "details": "Catch ClarificationTimeout, DockerDaemonError, GitError, and MCPConnectionError with appropriate fallback or exit messaging, then output task completion stats, timing, token usage, and trace link.", "status": "pending", "testStrategy": "Simulate each exception type and assert correct handling and summary output.", "parentId": "undefined" }, { "id": 5, "title": "Implement graceful shutdown, signals, and dry-run checks", "description": "Ensure cleanup and validation paths are covered.", "dependencies": [ 2 ], "details": "Add SIGINT/SIGTERM handlers, ensure cleanup of containers/worktrees and state persistence, and implement --dry-run validation that checks dependencies without executing the graph.", "status": "pending", "testStrategy": "Test signal handling with simulated SIGINT and verify cleanup methods called; test dry-run validation path.", "parentId": "undefined" } ], "complexity": 8, "recommendedSubtasks": 5, "expansionPrompt": "Break into CLI parsing/env loading, component wiring, retry/escalation logic, graceful shutdown/signal handling, and dry-run/validation tests.", "updatedAt": "2026-02-26T02:53:15.324Z" } ], "metadata": { "version": "1.0.0", "lastModified": "2026-02-26T02:53:15.329Z", "taskCount": 10, "completedCount": 10, "tags": [ "master" ] } } }