ai_ops2/README.md

# App Factory

Autonomous multi-agent orchestration framework. Give it a natural language prompt, get back a fully developed, QA-verified, and merged codebase.

## How It Works

```
User prompt
  → PM Agent (expands into structured PRD)
    → Task Agent (generates prioritized dependency graph via claude-task-master)
      → Dev Agents (concurrent, isolated Docker containers with Claude Code)
        → QA Agent (code review, tests, rebase, merge to main)
          → Done
```

If any agent gets blocked, the flow reverses through a **clarification loop** — Dev asks Task, Task asks PM, PM asks the human — while other agents keep working.

## Quick Start

### Prerequisites

- Python 3.11+
- Docker Desktop (running)
- Git
- Claude Code with OAuth **or** an [Anthropic API key](https://console.anthropic.com/)

### Setup

```bash
# Clone and enter project
git clone <repo-url> && cd ai_ops2

# Create venv and install dependencies
uv venv
uv pip install -r requirements.txt

# Configure environment (optional — not needed with Claude Code OAuth)
cp .env.example .env
# Edit .env to add API keys if not using OAuth
# Optionally add LANGSMITH_API_KEY for tracing
```

### Run

```bash
# Build a project from a prompt
python main.py --prompt "Build a video transcription service with Whisper and summarization"

# Limit concurrent dev agents
python main.py --prompt "Build a REST API" --max-concurrent-tasks 3

# Target a specific repo
python main.py --prompt "Add user authentication" --repo-path /path/to/project

# Validate config without executing
python main.py --dry-run --prompt "test"

# Verbose logging
python main.py --prompt "Build a CLI tool" --debug
```

## Architecture

### Agents

| Agent | File | Role |
|-------|------|------|
| **PMAgent** | `agents/pm_agent.py` | Expands prompts into PRDs, handles clarification requests |
| **TaskMasterAgent** | `agents/task_agent.py` | Bridges to claude-task-master for task graph management |
| **DevAgentManager** | `agents/dev_agent.py` | Spawns Claude Code in Docker containers via pexpect |
| **QAAgent** | `agents/qa_agent.py` | Code review, linting, testing, rebase, and merge |

### Core

| Component | File | Role |
|-----------|------|------|
| **AppFactoryOrchestrator** | `core/graph.py` | LangGraph state machine with conditional routing |
| **WorkspaceManager** | `core/workspace.py` | Git worktree + Docker container lifecycle |
| **ObservabilityManager** | `core/observability.py` | LangSmith tracing + structured logging |
| **ArchitectureTracker** | `core/architecture_tracker.py` | Prevents context starvation across dev agents |

### Project Structure

```
app_factory/
├── agents/
│   ├── pm_agent.py          # PRD generation + clarification
│   ├── task_agent.py        # claude-task-master interface
│   ├── dev_agent.py         # Claude Code + Docker orchestration
│   └── qa_agent.py          # Review, test, merge pipeline
├── core/
│   ├── graph.py             # LangGraph state machine
│   ├── workspace.py         # Git worktree + Docker isolation
│   ├── observability.py     # LangSmith tracing + logging
│   └── architecture_tracker.py  # Global architecture summary
├── prompts/                 # Agent prompt templates
│   ├── pm_prd_expansion.txt
│   ├── pm_clarification.txt
│   ├── dev_task_execution.txt
│   └── qa_review.txt
└── data/                    # Runtime state + architecture tracking
```

## Execution Phases

1. **Linear Planning** — User → PM Agent → Task Agent. Produces a prioritized DAG of tasks.
2. **Dynamic Concurrency** — Orchestrator spins up a WorkspaceManager + DevAgent for every unblocked task concurrently via `asyncio.gather()`.
3. **Clarification Loop** — Blocked agents route requests backward up the chain. Other agents continue uninterrupted.
4. **QA & Merge** — QA Agent rebases, lints, tests, reviews, and merges each completed task. Task Agent then unlocks downstream dependencies.

## Design Decisions

- **Context Starvation Prevention**: A read-only `ArchitectureTracker` summary is injected into every Dev Agent prompt so they know what other agents have built.
- **Merge Conflict Handling**: QA Agent rebases onto main before testing. Complex conflicts are kicked back to the Dev Agent automatically.
- **Infinite Loop Protection**: Max retry counter (3) per task at the LangGraph node level. Exceeded retries escalate to PM → human.
- **Claude Code Automation**: Dev agents interact with Claude Code via `pexpect` subprocess in headless mode inside Docker containers.

## Testing

```bash
# Run full test suite
python -m pytest tests/ -v

# Run specific test file
python -m pytest tests/test_graph.py -v

# Run with coverage
python -m pytest tests/ --cov=app_factory --cov-report=term-missing
```

**229 tests** across 9 test files covering all agents, core components, and integration.

## Configuration

### Authentication

App Factory supports two auth modes:

- **Claude Code OAuth** (default) — If you use Claude Code with OAuth, no API key is needed. The Claude Agent SDK (`claude-agent-sdk`) picks up your auth automatically.
- **API key** — Set `ANTHROPIC_API_KEY` in `.env` for direct API access.

### Environment Variables

| Variable | Required | Description |
|----------|----------|-------------|
| `ANTHROPIC_API_KEY` | No* | Claude API key. Not needed with Claude Code OAuth. |
| `OPENAI_API_KEY` | No | Codex fallback for algorithmic generation |
| `LANGSMITH_API_KEY` | No | LangSmith tracing and observability |
| `LANGSMITH_PROJECT` | No | LangSmith project name (default: `app-factory`) |

*Required only if not using Claude Code OAuth.*

## License

MIT