Files
ai_ops/README.md
2026-02-23 12:06:13 -05:00

8.7 KiB

AI Ops: Schema-Driven Multi-Agent Orchestration Runtime

TypeScript runtime for deterministic multi-agent execution with:

  • OpenAI Codex SDK integration (@openai/codex-sdk)
  • Anthropic Claude Agent SDK integration (@anthropic-ai/claude-agent-sdk)
  • Schema-validated orchestration (AgentManifest)
  • Stateless node handoffs via persisted state/context payloads
  • Resource provisioning (git worktrees + deterministic port ranges)
  • MCP configuration layer with handler-based policy hooks

Current Status

  • Provider entrypoints (codex, claude) run with session limits and resource provisioning.
  • Schema-driven orchestration is implemented as reusable modules under src/agents.
  • Recursive AgentManager.runRecursiveAgent(...) supports fanout/fan-in orchestration with abort propagation.
  • Tool clearance allowlist/banlist is modeled, but hard security enforcement is still a TODO at tool execution boundaries.

Repository Layout

  • src/agents:
    • manager.ts: queue-based concurrency limits + recursive fanout/fan-in orchestration.
    • runtime.ts: env-driven runtime singletons and defaults.
    • manifest.ts: AgentManifest schema parsing + validation (strict DAG).
    • persona-registry.ts: prompt templating + persona behavior events.
    • pipeline.ts: actor-oriented DAG runner with retries and state-dependent routing.
    • state-context.ts: persisted state + stateless handoff reconstruction.
    • provisioning.ts: extensible resource orchestration + child suballocation support.
    • orchestration.ts: SchemaDrivenExecutionEngine facade.
  • src/mcp: MCP config types, conversions, and handler resolution.
  • src/examples: provider entrypoints (codex.ts, claude.ts).
  • tests: unit coverage for manager, manifest, pipeline/orchestration, state context, MCP, and provisioning behavior.
  • docs/orchestration-engine.md: design notes for the orchestration architecture.

Prerequisites

  • Node.js 18+
  • npm

Setup

npm install
cp .env.example .env
cp mcp.config.example.json mcp.config.json

Fill in any values you need in .env.

Run

Run Codex example:

npm run codex -- "Summarize what this repository does."

Run Claude example:

npm run claude -- "Summarize what this repository does."

Run via unified entrypoint:

npm run dev -- codex "List potential improvements."
npm run dev -- claude "List potential improvements."

Schema-Driven Orchestration

The orchestration engine is exposed as library modules (not yet wired into src/index.ts by default).

Core pieces:

  • parseAgentManifest(...) validates the full orchestration schema.
  • PersonaRegistry injects runtime context into templated system prompts.
  • PipelineExecutor executes a strict DAG of actor nodes.
  • FileSystemStateContextManager enforces stateless handoffs.
  • SchemaDrivenExecutionEngine composes all of the above with env-driven limits.

AgentManifest Overview

AgentManifest (schema version "1") includes:

  • topologies: any of hierarchical, retry-unrolled, sequential
  • personas: identity, prompt template, tool clearance metadata
  • relationships: parent-child persona edges and constraints
  • pipeline: strict DAG with entry node, nodes, and edges
  • topologyConstraints: max depth and retry ceilings

Edge routing supports:

  • Event gates: success, validation_fail, failure, always, onTaskComplete, onValidationFail
  • Conditions:
    • state_flag
    • history_has_event
    • file_exists
    • always

Example manifest:

{
  "schemaVersion": "1",
  "topologies": ["hierarchical", "retry-unrolled", "sequential"],
  "personas": [
    {
      "id": "coder",
      "displayName": "Coder",
      "systemPromptTemplate": "Implement ticket {{ticket}} in repo {{repo}}",
      "toolClearance": {
        "allowlist": ["read_file", "write_file"],
        "banlist": ["rm"]
      }
    }
  ],
  "relationships": [],
  "pipeline": {
    "entryNodeId": "coder-1",
    "nodes": [
      {
        "id": "coder-1",
        "actorId": "coder_actor",
        "personaId": "coder",
        "constraints": { "maxRetries": 1 }
      }
    ],
    "edges": []
  },
  "topologyConstraints": {
    "maxDepth": 4,
    "maxRetries": 2
  }
}

Minimal Engine Usage

import { SchemaDrivenExecutionEngine } from "./src/agents/orchestration.js";

const engine = new SchemaDrivenExecutionEngine({
  manifest,
  actorExecutors: {
    coder_actor: async ({ prompt, context, toolClearance }) => {
      // execute actor logic here
      return {
        status: "success",
        payload: {
          summary: "done"
        },
        stateFlags: {
          implemented: true
        }
      };
    }
  },
  settings: {
    workspaceRoot: process.cwd(),
    runtimeContext: {
      repo: "ai_ops",
      ticket: "AIOPS-123"
    }
  }
});

const result = await engine.runSession({
  sessionId: "session-1",
  initialPayload: {
    task: "Implement feature"
  }
});

console.log(result.records);

Stateless Handoffs and Context

The engine does not depend on conversational memory between nodes.

  • Node inputs are written as handoff payloads to storage.
  • Each node execution reads a fresh context snapshot from disk.
  • Session state persists:
    • flags
    • metadata
    • history events

Default state root is controlled by AGENT_STATE_ROOT.

Recursive Orchestration Contract

AgentManager.runRecursiveAgent(...) uses a strict two-phase fanout/fan-in model:

  • Phase 1 (planner): agent execution returns either a terminal result or a fanout plan (intents[] + aggregate(...)).
  • Parent tokens are released before children are scheduled, avoiding deadlocks even when AGENT_MAX_CONCURRENT=1.
  • Children run in isolated deterministic session IDs (<parent>_child_<index>), each with their own AbortSignal.
  • Phase 2 (aggregator): once all children complete, the aggregate phase runs as a fresh invocation.

Optional child middleware hooks (allocateForChild, releaseForChild) let callers integrate provisioning/suballocation without coupling AgentManager to filesystem or git operations.

Resource Provisioning

The provisioning layer separates:

  • Hard constraints: actual resource allocation enforced before run.
  • Soft constraints: injected env vars, prompt sections, metadata, and discovery snapshot.

Built-in providers:

  • git-worktree
  • port-range

Runtime injection includes:

  • Working directory override
  • Injected env vars such as AGENT_WORKTREE_PATH, AGENT_PORT_RANGE_START, AGENT_PORT_RANGE_END, AGENT_PORT_PRIMARY
  • Discovery file path via AGENT_DISCOVERY_FILE

Hierarchical Suballocation

Parent sessions can suballocate resources for child sessions using:

  • ResourceProvisioningOrchestrator.provisionChildSession(...)
  • buildChildResourceRequests(...)

Behavior:

  • Child worktrees are placed under a deterministic parent-scoped root.
  • Child port blocks are deterministically carved from the parent assigned range.

MCP Configuration

Use mcp.config.json to configure shared and provider-specific MCP servers.

  • MCP_CONFIG_PATH controls config location (default ./mcp.config.json).
  • Shared server definitions are in servers.
  • Provider overrides:
    • codex.mcp_servers
    • claude.mcpServers
  • Handlers:
    • built-in context7
    • built-in claude-task-master
    • built-in generic
    • custom handlers via registerMcpHandler(...)

See mcp.config.example.json for a complete template.

Environment Variables

Provider/Auth

  • CODEX_API_KEY
  • OPENAI_API_KEY
  • OPENAI_BASE_URL
  • CODEX_SKIP_GIT_CHECK
  • ANTHROPIC_API_KEY
  • CLAUDE_MODEL
  • CLAUDE_CODE_PATH
  • MCP_CONFIG_PATH

Agent Manager Limits

  • AGENT_MAX_CONCURRENT
  • AGENT_MAX_SESSION
  • AGENT_MAX_RECURSIVE_DEPTH

Orchestration Limits

  • AGENT_STATE_ROOT
  • AGENT_TOPOLOGY_MAX_DEPTH
  • AGENT_TOPOLOGY_MAX_RETRIES
  • AGENT_RELATIONSHIP_MAX_CHILDREN

Provisioning

  • AGENT_WORKTREE_ROOT
  • AGENT_WORKTREE_BASE_REF
  • AGENT_PORT_BASE
  • AGENT_PORT_BLOCK_SIZE
  • AGENT_PORT_BLOCK_COUNT
  • AGENT_PORT_PRIMARY_OFFSET
  • AGENT_PORT_LOCK_DIR
  • AGENT_DISCOVERY_FILE_RELATIVE_PATH

Defaults are documented in .env.example.

Quality Gate

Run the full pre-PR gate:

npm run verify

Equivalent individual commands:

npm run check
npm run check:tests
npm run test
npm run build

Build and Start

npm run build
npm run start -- codex "Hello from built JS"

Known Limitations

  • Tool clearance allowlist/banlist is currently metadata; enforcement is not yet wired into an execution sandbox.

References