# AI Ops: Schema-Driven Multi-Agent Orchestration Runtime TypeScript runtime for deterministic multi-agent execution with: - OpenAI Codex SDK integration (`@openai/codex-sdk`) - Anthropic Claude Agent SDK integration (`@anthropic-ai/claude-agent-sdk`) - Schema-validated orchestration (`AgentManifest`) - Stateless node handoffs via persisted state/context payloads - Resource provisioning (git worktrees + deterministic port ranges) - MCP configuration layer with handler-based policy hooks ## Current Status - Provider entrypoints (`codex`, `claude`) run with session limits and resource provisioning. - Schema-driven orchestration is implemented as reusable modules under `src/agents`. - Recursive `AgentManager.runRecursiveAgent(...)` supports fanout/fan-in orchestration with abort propagation. - Tool clearance allowlist/banlist is modeled, but hard security enforcement is still a TODO at tool execution boundaries. ## Repository Layout - `src/agents`: - `manager.ts`: queue-based concurrency limits + recursive fanout/fan-in orchestration. - `runtime.ts`: env-driven runtime singletons and defaults. - `manifest.ts`: `AgentManifest` schema parsing + validation (strict DAG). - `persona-registry.ts`: prompt templating + persona behavior events. - `pipeline.ts`: actor-oriented DAG runner with retries and state-dependent routing. - `state-context.ts`: persisted state + stateless handoff reconstruction. - `provisioning.ts`: extensible resource orchestration + child suballocation support. - `orchestration.ts`: `SchemaDrivenExecutionEngine` facade. - `src/mcp`: MCP config types, conversions, and handler resolution. - `src/examples`: provider entrypoints (`codex.ts`, `claude.ts`). - `tests`: unit coverage for manager, manifest, pipeline/orchestration, state context, MCP, and provisioning behavior. - `docs/orchestration-engine.md`: design notes for the orchestration architecture. ## Prerequisites - Node.js 18+ - npm ## Setup ```bash npm install cp .env.example .env cp mcp.config.example.json mcp.config.json ``` Fill in any values you need in `.env`. ## Run Run Codex example: ```bash npm run codex -- "Summarize what this repository does." ``` Run Claude example: ```bash npm run claude -- "Summarize what this repository does." ``` Run via unified entrypoint: ```bash npm run dev -- codex "List potential improvements." npm run dev -- claude "List potential improvements." ``` ## Schema-Driven Orchestration The orchestration engine is exposed as library modules (not yet wired into `src/index.ts` by default). Core pieces: - `parseAgentManifest(...)` validates the full orchestration schema. - `PersonaRegistry` injects runtime context into templated system prompts. - `PipelineExecutor` executes a strict DAG of actor nodes. - `FileSystemStateContextManager` enforces stateless handoffs. - `SchemaDrivenExecutionEngine` composes all of the above with env-driven limits. ### AgentManifest Overview `AgentManifest` (schema version `"1"`) includes: - `topologies`: any of `hierarchical`, `retry-unrolled`, `sequential` - `personas`: identity, prompt template, tool clearance metadata - `relationships`: parent-child persona edges and constraints - `pipeline`: strict DAG with entry node, nodes, and edges - `topologyConstraints`: max depth and retry ceilings Edge routing supports: - Event gates: `success`, `validation_fail`, `failure`, `always`, `onTaskComplete`, `onValidationFail` - Conditions: - `state_flag` - `history_has_event` - `file_exists` - `always` Example manifest: ```json { "schemaVersion": "1", "topologies": ["hierarchical", "retry-unrolled", "sequential"], "personas": [ { "id": "coder", "displayName": "Coder", "systemPromptTemplate": "Implement ticket {{ticket}} in repo {{repo}}", "toolClearance": { "allowlist": ["read_file", "write_file"], "banlist": ["rm"] } } ], "relationships": [], "pipeline": { "entryNodeId": "coder-1", "nodes": [ { "id": "coder-1", "actorId": "coder_actor", "personaId": "coder", "constraints": { "maxRetries": 1 } } ], "edges": [] }, "topologyConstraints": { "maxDepth": 4, "maxRetries": 2 } } ``` ### Minimal Engine Usage ```ts import { SchemaDrivenExecutionEngine } from "./src/agents/orchestration.js"; const engine = new SchemaDrivenExecutionEngine({ manifest, actorExecutors: { coder_actor: async ({ prompt, context, toolClearance }) => { // execute actor logic here return { status: "success", payload: { summary: "done" }, stateFlags: { implemented: true } }; } }, settings: { workspaceRoot: process.cwd(), runtimeContext: { repo: "ai_ops", ticket: "AIOPS-123" } } }); const result = await engine.runSession({ sessionId: "session-1", initialPayload: { task: "Implement feature" } }); console.log(result.records); ``` ## Stateless Handoffs and Context The engine does not depend on conversational memory between nodes. - Node inputs are written as handoff payloads to storage. - Each node execution reads a fresh context snapshot from disk. - Session state persists: - flags - metadata - history events Default state root is controlled by `AGENT_STATE_ROOT`. ## Recursive Orchestration Contract `AgentManager.runRecursiveAgent(...)` uses a strict two-phase fanout/fan-in model: - Phase 1 (planner): agent execution returns either a terminal result or a fanout plan (`intents[]` + `aggregate(...)`). - Parent tokens are released before children are scheduled, avoiding deadlocks even when `AGENT_MAX_CONCURRENT=1`. - Children run in isolated deterministic session IDs (`_child_`), each with their own `AbortSignal`. - Phase 2 (aggregator): once all children complete, the aggregate phase runs as a fresh invocation. Optional child middleware hooks (`allocateForChild`, `releaseForChild`) let callers integrate provisioning/suballocation without coupling `AgentManager` to filesystem or git operations. ## Resource Provisioning The provisioning layer separates: - Hard constraints: actual resource allocation enforced before run. - Soft constraints: injected env vars, prompt sections, metadata, and discovery snapshot. Built-in providers: - `git-worktree` - `port-range` Runtime injection includes: - Working directory override - Injected env vars such as `AGENT_WORKTREE_PATH`, `AGENT_PORT_RANGE_START`, `AGENT_PORT_RANGE_END`, `AGENT_PORT_PRIMARY` - Discovery file path via `AGENT_DISCOVERY_FILE` ### Hierarchical Suballocation Parent sessions can suballocate resources for child sessions using: - `ResourceProvisioningOrchestrator.provisionChildSession(...)` - `buildChildResourceRequests(...)` Behavior: - Child worktrees are placed under a deterministic parent-scoped root. - Child port blocks are deterministically carved from the parent assigned range. ## MCP Configuration Use `mcp.config.json` to configure shared and provider-specific MCP servers. - `MCP_CONFIG_PATH` controls config location (default `./mcp.config.json`). - Shared server definitions are in `servers`. - Provider overrides: - `codex.mcp_servers` - `claude.mcpServers` - Handlers: - built-in `context7` - built-in `claude-task-master` - built-in `generic` - custom handlers via `registerMcpHandler(...)` See `mcp.config.example.json` for a complete template. ## Environment Variables ### Provider/Auth - `CODEX_API_KEY` - `OPENAI_API_KEY` - `OPENAI_BASE_URL` - `CODEX_SKIP_GIT_CHECK` - `ANTHROPIC_API_KEY` - `CLAUDE_MODEL` - `CLAUDE_CODE_PATH` - `MCP_CONFIG_PATH` ### Agent Manager Limits - `AGENT_MAX_CONCURRENT` - `AGENT_MAX_SESSION` - `AGENT_MAX_RECURSIVE_DEPTH` ### Orchestration Limits - `AGENT_STATE_ROOT` - `AGENT_TOPOLOGY_MAX_DEPTH` - `AGENT_TOPOLOGY_MAX_RETRIES` - `AGENT_RELATIONSHIP_MAX_CHILDREN` ### Provisioning - `AGENT_WORKTREE_ROOT` - `AGENT_WORKTREE_BASE_REF` - `AGENT_PORT_BASE` - `AGENT_PORT_BLOCK_SIZE` - `AGENT_PORT_BLOCK_COUNT` - `AGENT_PORT_PRIMARY_OFFSET` - `AGENT_PORT_LOCK_DIR` - `AGENT_DISCOVERY_FILE_RELATIVE_PATH` Defaults are documented in `.env.example`. ## Quality Gate Run the full pre-PR gate: ```bash npm run verify ``` Equivalent individual commands: ```bash npm run check npm run check:tests npm run test npm run build ``` ## Build and Start ```bash npm run build npm run start -- codex "Hello from built JS" ``` ## Known Limitations - Tool clearance allowlist/banlist is currently metadata; enforcement is not yet wired into an execution sandbox. ## References - `docs/orchestration-engine.md` - OpenAI Codex SDK docs: https://developers.openai.com/codex/sdk/ - Codex MCP config docs: https://developers.openai.com/codex/config#model-context-protocol-mcp_servers - Claude Agent SDK docs: https://platform.claude.com/docs/en/agent-sdk/overview