zman27/ai_ops

Fork 0

Files

Josh Rzemien 53af0d44cd first commit

2026-02-23 12:06:13 -05:00

8.7 KiB

Raw Blame History

AI Ops: Schema-Driven Multi-Agent Orchestration Runtime

TypeScript runtime for deterministic multi-agent execution with:

OpenAI Codex SDK integration (@openai/codex-sdk)
Anthropic Claude Agent SDK integration (@anthropic-ai/claude-agent-sdk)
Schema-validated orchestration (AgentManifest)
Stateless node handoffs via persisted state/context payloads
Resource provisioning (git worktrees + deterministic port ranges)
MCP configuration layer with handler-based policy hooks

Current Status

Provider entrypoints (codex, claude) run with session limits and resource provisioning.
Schema-driven orchestration is implemented as reusable modules under src/agents.
Recursive AgentManager.runRecursiveAgent(...) supports fanout/fan-in orchestration with abort propagation.
Tool clearance allowlist/banlist is modeled, but hard security enforcement is still a TODO at tool execution boundaries.

Repository Layout

src/agents:
- manager.ts: queue-based concurrency limits + recursive fanout/fan-in orchestration.
- runtime.ts: env-driven runtime singletons and defaults.
- manifest.ts: AgentManifest schema parsing + validation (strict DAG).
- persona-registry.ts: prompt templating + persona behavior events.
- pipeline.ts: actor-oriented DAG runner with retries and state-dependent routing.
- state-context.ts: persisted state + stateless handoff reconstruction.
- provisioning.ts: extensible resource orchestration + child suballocation support.
- orchestration.ts: SchemaDrivenExecutionEngine facade.
src/mcp: MCP config types, conversions, and handler resolution.
src/examples: provider entrypoints (codex.ts, claude.ts).
tests: unit coverage for manager, manifest, pipeline/orchestration, state context, MCP, and provisioning behavior.
docs/orchestration-engine.md: design notes for the orchestration architecture.

Prerequisites

Node.js 18+
npm

Setup

npm install
cp .env.example .env
cp mcp.config.example.json mcp.config.json

Fill in any values you need in .env.

Run

Run Codex example:

npm run codex -- "Summarize what this repository does."

Run Claude example:

npm run claude -- "Summarize what this repository does."

Run via unified entrypoint:

npm run dev -- codex "List potential improvements."
npm run dev -- claude "List potential improvements."

Schema-Driven Orchestration

The orchestration engine is exposed as library modules (not yet wired into src/index.ts by default).

Core pieces:

parseAgentManifest(...) validates the full orchestration schema.
PersonaRegistry injects runtime context into templated system prompts.
PipelineExecutor executes a strict DAG of actor nodes.
FileSystemStateContextManager enforces stateless handoffs.
SchemaDrivenExecutionEngine composes all of the above with env-driven limits.

AgentManifest Overview

AgentManifest (schema version "1") includes:

topologies: any of hierarchical, retry-unrolled, sequential
personas: identity, prompt template, tool clearance metadata
relationships: parent-child persona edges and constraints
pipeline: strict DAG with entry node, nodes, and edges
topologyConstraints: max depth and retry ceilings

Edge routing supports:

Event gates: success, validation_fail, failure, always, onTaskComplete, onValidationFail
Conditions:
- state_flag
- history_has_event
- file_exists
- always

Example manifest:

{
  "schemaVersion": "1",
  "topologies": ["hierarchical", "retry-unrolled", "sequential"],
  "personas": [
    {
      "id": "coder",
      "displayName": "Coder",
      "systemPromptTemplate": "Implement ticket {{ticket}} in repo {{repo}}",
      "toolClearance": {
        "allowlist": ["read_file", "write_file"],
        "banlist": ["rm"]
      }
    }
  ],
  "relationships": [],
  "pipeline": {
    "entryNodeId": "coder-1",
    "nodes": [
      {
        "id": "coder-1",
        "actorId": "coder_actor",
        "personaId": "coder",
        "constraints": { "maxRetries": 1 }
      }
    ],
    "edges": []
  },
  "topologyConstraints": {
    "maxDepth": 4,
    "maxRetries": 2
  }
}

Minimal Engine Usage

import { SchemaDrivenExecutionEngine } from "./src/agents/orchestration.js";

const engine = new SchemaDrivenExecutionEngine({
  manifest,
  actorExecutors: {
    coder_actor: async ({ prompt, context, toolClearance }) => {
      // execute actor logic here
      return {
        status: "success",
        payload: {
          summary: "done"
        },
        stateFlags: {
          implemented: true
        }
      };
    }
  },
  settings: {
    workspaceRoot: process.cwd(),
    runtimeContext: {
      repo: "ai_ops",
      ticket: "AIOPS-123"
    }
  }
});

const result = await engine.runSession({
  sessionId: "session-1",
  initialPayload: {
    task: "Implement feature"
  }
});

console.log(result.records);

Stateless Handoffs and Context

The engine does not depend on conversational memory between nodes.

Node inputs are written as handoff payloads to storage.
Each node execution reads a fresh context snapshot from disk.
Session state persists:
- flags
- metadata
- history events

Default state root is controlled by AGENT_STATE_ROOT.

Recursive Orchestration Contract

AgentManager.runRecursiveAgent(...) uses a strict two-phase fanout/fan-in model:

Phase 1 (planner): agent execution returns either a terminal result or a fanout plan (intents[] + aggregate(...)).
Parent tokens are released before children are scheduled, avoiding deadlocks even when AGENT_MAX_CONCURRENT=1.
Children run in isolated deterministic session IDs (<parent>_child_<index>), each with their own AbortSignal.
Phase 2 (aggregator): once all children complete, the aggregate phase runs as a fresh invocation.

Optional child middleware hooks (allocateForChild, releaseForChild) let callers integrate provisioning/suballocation without coupling AgentManager to filesystem or git operations.

Resource Provisioning

The provisioning layer separates:

Hard constraints: actual resource allocation enforced before run.
Soft constraints: injected env vars, prompt sections, metadata, and discovery snapshot.

Built-in providers:

git-worktree
port-range

Runtime injection includes:

Working directory override
Injected env vars such as AGENT_WORKTREE_PATH, AGENT_PORT_RANGE_START, AGENT_PORT_RANGE_END, AGENT_PORT_PRIMARY
Discovery file path via AGENT_DISCOVERY_FILE

Hierarchical Suballocation

Parent sessions can suballocate resources for child sessions using:

ResourceProvisioningOrchestrator.provisionChildSession(...)
buildChildResourceRequests(...)

Behavior:

Child worktrees are placed under a deterministic parent-scoped root.
Child port blocks are deterministically carved from the parent assigned range.

MCP Configuration

Use mcp.config.json to configure shared and provider-specific MCP servers.

MCP_CONFIG_PATH controls config location (default ./mcp.config.json).
Shared server definitions are in servers.
Provider overrides:
- codex.mcp_servers
- claude.mcpServers
Handlers:
- built-in context7
- built-in claude-task-master
- built-in generic
- custom handlers via registerMcpHandler(...)

See mcp.config.example.json for a complete template.

Environment Variables

Provider/Auth

CODEX_API_KEY
OPENAI_API_KEY
OPENAI_BASE_URL
CODEX_SKIP_GIT_CHECK
ANTHROPIC_API_KEY
CLAUDE_MODEL
CLAUDE_CODE_PATH
MCP_CONFIG_PATH

Agent Manager Limits

AGENT_MAX_CONCURRENT
AGENT_MAX_SESSION
AGENT_MAX_RECURSIVE_DEPTH

Orchestration Limits

AGENT_STATE_ROOT
AGENT_TOPOLOGY_MAX_DEPTH
AGENT_TOPOLOGY_MAX_RETRIES
AGENT_RELATIONSHIP_MAX_CHILDREN

Provisioning

AGENT_WORKTREE_ROOT
AGENT_WORKTREE_BASE_REF
AGENT_PORT_BASE
AGENT_PORT_BLOCK_SIZE
AGENT_PORT_BLOCK_COUNT
AGENT_PORT_PRIMARY_OFFSET
AGENT_PORT_LOCK_DIR
AGENT_DISCOVERY_FILE_RELATIVE_PATH

Defaults are documented in .env.example.

Quality Gate

Run the full pre-PR gate:

npm run verify

Equivalent individual commands:

npm run check
npm run check:tests
npm run test
npm run build

Build and Start

npm run build
npm run start -- codex "Hello from built JS"

Known Limitations

Tool clearance allowlist/banlist is currently metadata; enforcement is not yet wired into an execution sandbox.

References

docs/orchestration-engine.md
OpenAI Codex SDK docs: https://developers.openai.com/codex/sdk/
Codex MCP config docs: https://developers.openai.com/codex/config#model-context-protocol-mcp_servers
Claude Agent SDK docs: https://platform.claude.com/docs/en/agent-sdk/overview

8.7 KiB Raw Blame History