App Factory

Autonomous multi-agent orchestration framework. Give it a natural language prompt, get back a fully developed, QA-verified, and merged codebase.

How It Works

User prompt
  → PM Agent (expands into structured PRD)
    → Task Agent (generates prioritized dependency graph via claude-task-master)
      → Dev Agents (concurrent, isolated Docker containers with Claude Code)
        → QA Agent (code review, tests, rebase, merge to main)
          → Done

If any agent gets blocked, the flow reverses through a clarification loop — Dev asks Task, Task asks PM, PM asks the human — while other agents keep working.

Quick Start

Prerequisites

Python 3.11+
Docker Desktop (running)
Git
Claude Code with OAuth or an Anthropic API key

Setup

# Clone and enter project
git clone <repo-url> && cd ai_ops2

# Create venv and install dependencies
uv venv
uv pip install -r requirements.txt

# Configure environment (optional — not needed with Claude Code OAuth)
cp .env.example .env
# Edit .env to add API keys if not using OAuth
# Optionally add LANGSMITH_API_KEY for tracing

Run

# Build a project from a prompt
python main.py --prompt "Build a video transcription service with Whisper and summarization"

# Limit concurrent dev agents
python main.py --prompt "Build a REST API" --max-concurrent-tasks 3

# Target a specific repo
python main.py --prompt "Add user authentication" --repo-path /path/to/project

# Validate config without executing
python main.py --dry-run --prompt "test"

# Verbose logging
python main.py --prompt "Build a CLI tool" --debug

Architecture

Agents

Agent	File	Role
PMAgent	`agents/pm_agent.py`	Expands prompts into PRDs, handles clarification requests
TaskMasterAgent	`agents/task_agent.py`	Bridges to claude-task-master for task graph management
DevAgentManager	`agents/dev_agent.py`	Spawns Claude Code in Docker containers via pexpect
QAAgent	`agents/qa_agent.py`	Code review, linting, testing, rebase, and merge

Core

Component	File	Role
AppFactoryOrchestrator	`core/graph.py`	LangGraph state machine with conditional routing
WorkspaceManager	`core/workspace.py`	Git worktree + Docker container lifecycle
ObservabilityManager	`core/observability.py`	LangSmith tracing + structured logging
ArchitectureTracker	`core/architecture_tracker.py`	Prevents context starvation across dev agents

Project Structure

app_factory/
├── agents/
│   ├── pm_agent.py          # PRD generation + clarification
│   ├── task_agent.py        # claude-task-master interface
│   ├── dev_agent.py         # Claude Code + Docker orchestration
│   └── qa_agent.py          # Review, test, merge pipeline
├── core/
│   ├── graph.py             # LangGraph state machine
│   ├── workspace.py         # Git worktree + Docker isolation
│   ├── observability.py     # LangSmith tracing + logging
│   └── architecture_tracker.py  # Global architecture summary
├── prompts/                 # Agent prompt templates
│   ├── pm_prd_expansion.txt
│   ├── pm_clarification.txt
│   ├── dev_task_execution.txt
│   └── qa_review.txt
└── data/                    # Runtime state + architecture tracking

Execution Phases

Linear Planning — User → PM Agent → Task Agent. Produces a prioritized DAG of tasks.
Dynamic Concurrency — Orchestrator spins up a WorkspaceManager + DevAgent for every unblocked task concurrently via asyncio.gather().
Clarification Loop — Blocked agents route requests backward up the chain. Other agents continue uninterrupted.
QA & Merge — QA Agent rebases, lints, tests, reviews, and merges each completed task. Task Agent then unlocks downstream dependencies.

Design Decisions

Context Starvation Prevention: A read-only ArchitectureTracker summary is injected into every Dev Agent prompt so they know what other agents have built.
Merge Conflict Handling: QA Agent rebases onto main before testing. Complex conflicts are kicked back to the Dev Agent automatically.
Infinite Loop Protection: Max retry counter (3) per task at the LangGraph node level. Exceeded retries escalate to PM → human.
Claude Code Automation: Dev agents interact with Claude Code via pexpect subprocess in headless mode inside Docker containers.

Testing

# Run full test suite
python -m pytest tests/ -v

# Run specific test file
python -m pytest tests/test_graph.py -v

# Run with coverage
python -m pytest tests/ --cov=app_factory --cov-report=term-missing

229 tests across 9 test files covering all agents, core components, and integration.

Configuration

Authentication

App Factory supports two auth modes:

Claude Code OAuth (default) — If you use Claude Code with OAuth, no API key is needed. The Claude Agent SDK (claude-agent-sdk) picks up your auth automatically.
API key — Set ANTHROPIC_API_KEY in .env for direct API access.

Environment Variables

Variable	Required	Description
`ANTHROPIC_API_KEY`	No*	Claude API key. Not needed with Claude Code OAuth.
`OPENAI_API_KEY`	No	Codex fallback for algorithmic generation
`LANGSMITH_API_KEY`	No	LangSmith tracing and observability
`LANGSMITH_PROJECT`	No	LangSmith project name (default: `app-factory`)

Required only if not using Claude Code OAuth.

License

MIT

5.6 KiB Raw Permalink Blame History