# App Factory Autonomous multi-agent orchestration framework. Give it a natural language prompt, get back a fully developed, QA-verified, and merged codebase. ## How It Works ``` User prompt → PM Agent (expands into structured PRD) → Task Agent (generates prioritized dependency graph via claude-task-master) → Dev Agents (concurrent, isolated Docker containers with Claude Code) → QA Agent (code review, tests, rebase, merge to main) → Done ``` If any agent gets blocked, the flow reverses through a **clarification loop** — Dev asks Task, Task asks PM, PM asks the human — while other agents keep working. ## Quick Start ### Prerequisites - Python 3.11+ - Docker Desktop (running) - Git - Claude Code with OAuth **or** an [Anthropic API key](https://console.anthropic.com/) ### Setup ```bash # Clone and enter project git clone && cd ai_ops2 # Create venv and install dependencies uv venv uv pip install -r requirements.txt # Configure environment (optional — not needed with Claude Code OAuth) cp .env.example .env # Edit .env to add API keys if not using OAuth # Optionally add LANGSMITH_API_KEY for tracing ``` ### Run ```bash # Build a project from a prompt python main.py --prompt "Build a video transcription service with Whisper and summarization" # Limit concurrent dev agents python main.py --prompt "Build a REST API" --max-concurrent-tasks 3 # Target a specific repo python main.py --prompt "Add user authentication" --repo-path /path/to/project # Validate config without executing python main.py --dry-run --prompt "test" # Verbose logging python main.py --prompt "Build a CLI tool" --debug ``` ## Architecture ### Agents | Agent | File | Role | |-------|------|------| | **PMAgent** | `agents/pm_agent.py` | Expands prompts into PRDs, handles clarification requests | | **TaskMasterAgent** | `agents/task_agent.py` | Bridges to claude-task-master for task graph management | | **DevAgentManager** | `agents/dev_agent.py` | Spawns Claude Code in Docker containers via pexpect | | **QAAgent** | `agents/qa_agent.py` | Code review, linting, testing, rebase, and merge | ### Core | Component | File | Role | |-----------|------|------| | **AppFactoryOrchestrator** | `core/graph.py` | LangGraph state machine with conditional routing | | **WorkspaceManager** | `core/workspace.py` | Git worktree + Docker container lifecycle | | **ObservabilityManager** | `core/observability.py` | LangSmith tracing + structured logging | | **ArchitectureTracker** | `core/architecture_tracker.py` | Prevents context starvation across dev agents | ### Project Structure ``` app_factory/ ├── agents/ │ ├── pm_agent.py # PRD generation + clarification │ ├── task_agent.py # claude-task-master interface │ ├── dev_agent.py # Claude Code + Docker orchestration │ └── qa_agent.py # Review, test, merge pipeline ├── core/ │ ├── graph.py # LangGraph state machine │ ├── workspace.py # Git worktree + Docker isolation │ ├── observability.py # LangSmith tracing + logging │ └── architecture_tracker.py # Global architecture summary ├── prompts/ # Agent prompt templates │ ├── pm_prd_expansion.txt │ ├── pm_clarification.txt │ ├── dev_task_execution.txt │ └── qa_review.txt └── data/ # Runtime state + architecture tracking ``` ## Execution Phases 1. **Linear Planning** — User → PM Agent → Task Agent. Produces a prioritized DAG of tasks. 2. **Dynamic Concurrency** — Orchestrator spins up a WorkspaceManager + DevAgent for every unblocked task concurrently via `asyncio.gather()`. 3. **Clarification Loop** — Blocked agents route requests backward up the chain. Other agents continue uninterrupted. 4. **QA & Merge** — QA Agent rebases, lints, tests, reviews, and merges each completed task. Task Agent then unlocks downstream dependencies. ## Design Decisions - **Context Starvation Prevention**: A read-only `ArchitectureTracker` summary is injected into every Dev Agent prompt so they know what other agents have built. - **Merge Conflict Handling**: QA Agent rebases onto main before testing. Complex conflicts are kicked back to the Dev Agent automatically. - **Infinite Loop Protection**: Max retry counter (3) per task at the LangGraph node level. Exceeded retries escalate to PM → human. - **Claude Code Automation**: Dev agents interact with Claude Code via `pexpect` subprocess in headless mode inside Docker containers. ## Testing ```bash # Run full test suite python -m pytest tests/ -v # Run specific test file python -m pytest tests/test_graph.py -v # Run with coverage python -m pytest tests/ --cov=app_factory --cov-report=term-missing ``` **229 tests** across 9 test files covering all agents, core components, and integration. ## Configuration ### Authentication App Factory supports two auth modes: - **Claude Code OAuth** (default) — If you use Claude Code with OAuth, no API key is needed. The Claude Agent SDK (`claude-agent-sdk`) picks up your auth automatically. - **API key** — Set `ANTHROPIC_API_KEY` in `.env` for direct API access. ### Environment Variables | Variable | Required | Description | |----------|----------|-------------| | `ANTHROPIC_API_KEY` | No* | Claude API key. Not needed with Claude Code OAuth. | | `OPENAI_API_KEY` | No | Codex fallback for algorithmic generation | | `LANGSMITH_API_KEY` | No | LangSmith tracing and observability | | `LANGSMITH_PROJECT` | No | LangSmith project name (default: `app-factory`) | *Required only if not using Claude Code OAuth.* ## License MIT