Files
ai_ops2/README.md
2026-02-25 23:49:54 -05:00

5.6 KiB

App Factory

Autonomous multi-agent orchestration framework. Give it a natural language prompt, get back a fully developed, QA-verified, and merged codebase.

How It Works

User prompt
  → PM Agent (expands into structured PRD)
    → Task Agent (generates prioritized dependency graph via claude-task-master)
      → Dev Agents (concurrent, isolated Docker containers with Claude Code)
        → QA Agent (code review, tests, rebase, merge to main)
          → Done

If any agent gets blocked, the flow reverses through a clarification loop — Dev asks Task, Task asks PM, PM asks the human — while other agents keep working.

Quick Start

Prerequisites

  • Python 3.11+
  • Docker Desktop (running)
  • Git
  • Claude Code with OAuth or an Anthropic API key

Setup

# Clone and enter project
git clone <repo-url> && cd ai_ops2

# Create venv and install dependencies
uv venv
uv pip install -r requirements.txt

# Configure environment (optional — not needed with Claude Code OAuth)
cp .env.example .env
# Edit .env to add API keys if not using OAuth
# Optionally add LANGSMITH_API_KEY for tracing

Run

# Build a project from a prompt
python main.py --prompt "Build a video transcription service with Whisper and summarization"

# Limit concurrent dev agents
python main.py --prompt "Build a REST API" --max-concurrent-tasks 3

# Target a specific repo
python main.py --prompt "Add user authentication" --repo-path /path/to/project

# Validate config without executing
python main.py --dry-run --prompt "test"

# Verbose logging
python main.py --prompt "Build a CLI tool" --debug

Architecture

Agents

Agent File Role
PMAgent agents/pm_agent.py Expands prompts into PRDs, handles clarification requests
TaskMasterAgent agents/task_agent.py Bridges to claude-task-master for task graph management
DevAgentManager agents/dev_agent.py Spawns Claude Code in Docker containers via pexpect
QAAgent agents/qa_agent.py Code review, linting, testing, rebase, and merge

Core

Component File Role
AppFactoryOrchestrator core/graph.py LangGraph state machine with conditional routing
WorkspaceManager core/workspace.py Git worktree + Docker container lifecycle
ObservabilityManager core/observability.py LangSmith tracing + structured logging
ArchitectureTracker core/architecture_tracker.py Prevents context starvation across dev agents

Project Structure

app_factory/
├── agents/
│   ├── pm_agent.py          # PRD generation + clarification
│   ├── task_agent.py        # claude-task-master interface
│   ├── dev_agent.py         # Claude Code + Docker orchestration
│   └── qa_agent.py          # Review, test, merge pipeline
├── core/
│   ├── graph.py             # LangGraph state machine
│   ├── workspace.py         # Git worktree + Docker isolation
│   ├── observability.py     # LangSmith tracing + logging
│   └── architecture_tracker.py  # Global architecture summary
├── prompts/                 # Agent prompt templates
│   ├── pm_prd_expansion.txt
│   ├── pm_clarification.txt
│   ├── dev_task_execution.txt
│   └── qa_review.txt
└── data/                    # Runtime state + architecture tracking

Execution Phases

  1. Linear Planning — User → PM Agent → Task Agent. Produces a prioritized DAG of tasks.
  2. Dynamic Concurrency — Orchestrator spins up a WorkspaceManager + DevAgent for every unblocked task concurrently via asyncio.gather().
  3. Clarification Loop — Blocked agents route requests backward up the chain. Other agents continue uninterrupted.
  4. QA & Merge — QA Agent rebases, lints, tests, reviews, and merges each completed task. Task Agent then unlocks downstream dependencies.

Design Decisions

  • Context Starvation Prevention: A read-only ArchitectureTracker summary is injected into every Dev Agent prompt so they know what other agents have built.
  • Merge Conflict Handling: QA Agent rebases onto main before testing. Complex conflicts are kicked back to the Dev Agent automatically.
  • Infinite Loop Protection: Max retry counter (3) per task at the LangGraph node level. Exceeded retries escalate to PM → human.
  • Claude Code Automation: Dev agents interact with Claude Code via pexpect subprocess in headless mode inside Docker containers.

Testing

# Run full test suite
python -m pytest tests/ -v

# Run specific test file
python -m pytest tests/test_graph.py -v

# Run with coverage
python -m pytest tests/ --cov=app_factory --cov-report=term-missing

229 tests across 9 test files covering all agents, core components, and integration.

Configuration

Authentication

App Factory supports two auth modes:

  • Claude Code OAuth (default) — If you use Claude Code with OAuth, no API key is needed. The Claude Agent SDK (claude-agent-sdk) picks up your auth automatically.
  • API key — Set ANTHROPIC_API_KEY in .env for direct API access.

Environment Variables

Variable Required Description
ANTHROPIC_API_KEY No* Claude API key. Not needed with Claude Code OAuth.
OPENAI_API_KEY No Codex fallback for algorithmic generation
LANGSMITH_API_KEY No LangSmith tracing and observability
LANGSMITH_PROJECT No LangSmith project name (default: app-factory)

Required only if not using Claude Code OAuth.

License

MIT