ai_ops2/prd.md

Product Requirements Document (PRD): Autonomous App Factory

Objective:
Develop a multi-agent orchestration framework where a single natural language prompt results in a fully developed, QA-verified, and merged codebase, operating with minimal human intervention.

Core Workflow & Bi-Directional Flow:

    Input: You input a prompt describing a project—like a video transcription and summarization service.

    PM Agent: Expands and clarifies the prompt into a detailed, structured PRD.

    Task Agent: Uses claude-task-master to parse the PRD and generate a prioritized dependency graph of tasks and subtasks.

    Dev Agents (Dynamic): Provisioned automatically for unblocked tasks. They operate in isolated clean rooms (worktrees) using Claude Code and Codex, with context strictly limited to the current state and task.

    QA Agent: Reviews code, runs tests, resolves conflicts, and merges the worktree into the main branch.

    Backward Flow (Clarification Loop): If blocked, the flow reverses: Dev queries Task Agent → Task Agent queries PM Agent → PM Agent queries the Human.

Architecture & Tech Stack

Core Orchestration:

    Framework: LangGraph (Python). It acts as the central state machine, tracking the global project status, managing the graph of tasks, and routing the bi-directional communication.

    Task Management: claude-task-master. Your Python backend will interface with it either via its CLI commands or as an MCP (Model Context Protocol) client to maintain the .taskmaster state and dependency logic.

    Observability: LangSmith (native to LangGraph) for tracing agent decision paths and token usage, combined with standard Python logging.

Environment & Isolation:

    Clean Rooms: Docker (managed via the docker Python SDK). Every Dev Agent gets an ephemeral, throwaway container.

    Version Control: GitPython. Uses git worktree add to create isolated directories for each Dev Agent without cloning the entire repo multiple times.

Agent Tooling:

    Dev Engine: Claude Code for multi-file reasoning, deep codebase edits, and terminal execution, supplemented by Codex/OpenAI APIs for highly specific algorithmic generation.

    Reasoning Engine: Claude 3.7 Sonnet or GPT-4o for the PM, Task, and QA agents.

Key Services, Systems, and Classes

    AppFactoryOrchestrator: The main LangGraph state machine. It holds the global state (e.g., active tasks, blocked tasks) and routes execution between agent nodes.

    PMAgent: Takes user input, structures the PRD, and acts as the bridge back to the user when downstream agents trigger a ClarificationRequest.

    TaskMasterAgent: The bridge to claude-task-master. Identifies unblocked_tasks and updates task statuses as Dev Agents succeed or fail.

    WorkspaceManager: The sandbox service.

        create_worktree(task_id): Creates a fresh, isolated git worktree branch from main.

        spin_up_clean_room(worktree_path): Mounts the worktree into a secure, ephemeral Docker container.

    DevAgentManager: Spawns Dev Agents inside the Docker containers. It crafts the strict, minimized prompt containing only the specific task details.

    QAAgent: Receives the completed worktree. Performs static analysis, runs tests, and handles the git merge back to main. If it fails, it routes the state back to the Dev Agent.

Workflow & Concurrency Highlights

    Phase 1: Linear Planning

        Execution is sequential: User → PM Agent → Task Agent. The output is a populated DAG of tasks.

    Phase 2: Dynamic Concurrency (The Execution Loop)

        The orchestrator queries the Task Agent for all currently unblocked tasks.

        Concurrency: Using Python's asyncio, the orchestrator spins up a WorkspaceManager and a DevAgent concurrently for every unblocked task. If 5 tasks have no dependencies, 5 Docker containers spin up simultaneously.

    Phase 3: The Clarification Loop

        If a Dev Agent hits an ambiguity, it triggers a ClarificationRequest state.

        LangGraph routes this specific branch backward up the chain. Crucially, other concurrent Dev Agents continue working uninterrupted.

    Phase 4: QA & Merge Pipeline

        When a Dev Agent finishes, it pauses. The QA Agent picks up the worktree, validates it, and merges. The Task Agent then unlocks the next dependencies in the graph.

Important Callouts & Design Considerations

    The Context Starvation Problem: You correctly prioritized minimizing context windows. However, a pure "clean room" might result in Dev Agents creating duplicate utility functions or conflicting architectural patterns because they don't know what the other agents are doing.

        Fix: Implement a lightweight, read-only GlobalArchitecture summary (updated by the PM/Task agent) that is injected alongside the isolated task description.

    Merge Conflict Hell: With highly concurrent Dev Agents working on an interconnected app, merge conflicts at the QA stage are inevitable.

        Fix: The QA Agent must have specific tools to perform a git rebase main on the Dev Agent's branch before testing. If the rebase causes complex conflicts, the QA Agent must be able to automatically kick the branch back to the Dev Agent to resolve them.

    Infinite Loops & Token Burn: Autonomous bi-directional flows can easily get stuck in a loop (e.g., QA rejects → Dev "fixes" → QA rejects → Dev undoes fix).

        Fix: Implement strict max-retry counters at the LangGraph node level. If a task bounces between Dev and QA three times, automatically escalate it to the PM Agent for human intervention.

    Claude Code Automation: Since Claude Code is an interactive CLI tool, your Dev Agent Python wrapper will need to interact with it via pexpect or as a subprocess pipeline, feeding it the isolated task and parsing its exit codes.

Project Structure
Plaintext

app_factory/
├── agents/
│   ├── pm_agent.py
│   ├── task_agent.py      # Interfaces with claude-task-master
│   ├── dev_agent.py       # Wraps Claude Code / Codex
│   └── qa_agent.py
├── core/
│   ├── graph.py           # LangGraph state machine & routing
│   ├── workspace.py       # Docker & Git worktree provisioning
│   └── observability.py   # LangSmith tracing & logging setup
├── prompts/               # Strict templates for context minimization
├── data/                  # Global state, PRDs
├── main.py                # Entry point
└── requirements.txt