AI Ops: Schema-Driven Multi-Agent Orchestration Runtime
TypeScript runtime for deterministic multi-agent execution with:
- OpenAI Codex SDK integration (
@openai/codex-sdk) - Anthropic Claude Agent SDK integration (
@anthropic-ai/claude-agent-sdk) - Schema-validated orchestration (
AgentManifest) - Stateless node handoffs via persisted state/context payloads
- Resource provisioning (git worktrees + deterministic port ranges)
- MCP configuration layer with handler-based policy hooks
Current Status
- Provider entrypoints (
codex,claude) run with session limits and resource provisioning. - Schema-driven orchestration is implemented as reusable modules under
src/agents. - Recursive
AgentManager.runRecursiveAgent(...)supports fanout/fan-in orchestration with abort propagation. - Tool clearance allowlist/banlist is modeled, but hard security enforcement is still a TODO at tool execution boundaries.
Repository Layout
src/agents:manager.ts: queue-based concurrency limits + recursive fanout/fan-in orchestration.runtime.ts: env-driven runtime singletons and defaults.manifest.ts:AgentManifestschema parsing + validation (strict DAG).persona-registry.ts: prompt templating + persona behavior events.pipeline.ts: actor-oriented DAG runner with retries and state-dependent routing.state-context.ts: persisted state + stateless handoff reconstruction.provisioning.ts: extensible resource orchestration + child suballocation support.orchestration.ts:SchemaDrivenExecutionEnginefacade.
src/mcp: MCP config types, conversions, and handler resolution.src/examples: provider entrypoints (codex.ts,claude.ts).tests: unit coverage for manager, manifest, pipeline/orchestration, state context, MCP, and provisioning behavior.docs/orchestration-engine.md: design notes for the orchestration architecture.
Prerequisites
- Node.js 18+
- npm
Setup
npm install
cp .env.example .env
cp mcp.config.example.json mcp.config.json
Fill in any values you need in .env.
Run
Run Codex example:
npm run codex -- "Summarize what this repository does."
Run Claude example:
npm run claude -- "Summarize what this repository does."
Run via unified entrypoint:
npm run dev -- codex "List potential improvements."
npm run dev -- claude "List potential improvements."
Schema-Driven Orchestration
The orchestration engine is exposed as library modules (not yet wired into src/index.ts by default).
Core pieces:
parseAgentManifest(...)validates the full orchestration schema.PersonaRegistryinjects runtime context into templated system prompts.PipelineExecutorexecutes a strict DAG of actor nodes.FileSystemStateContextManagerenforces stateless handoffs.SchemaDrivenExecutionEnginecomposes all of the above with env-driven limits.
AgentManifest Overview
AgentManifest (schema version "1") includes:
topologies: any ofhierarchical,retry-unrolled,sequentialpersonas: identity, prompt template, tool clearance metadatarelationships: parent-child persona edges and constraintspipeline: strict DAG with entry node, nodes, and edgestopologyConstraints: max depth and retry ceilings
Edge routing supports:
- Event gates:
success,validation_fail,failure,always,onTaskComplete,onValidationFail - Conditions:
state_flaghistory_has_eventfile_existsalways
Example manifest:
{
"schemaVersion": "1",
"topologies": ["hierarchical", "retry-unrolled", "sequential"],
"personas": [
{
"id": "coder",
"displayName": "Coder",
"systemPromptTemplate": "Implement ticket {{ticket}} in repo {{repo}}",
"toolClearance": {
"allowlist": ["read_file", "write_file"],
"banlist": ["rm"]
}
}
],
"relationships": [],
"pipeline": {
"entryNodeId": "coder-1",
"nodes": [
{
"id": "coder-1",
"actorId": "coder_actor",
"personaId": "coder",
"constraints": { "maxRetries": 1 }
}
],
"edges": []
},
"topologyConstraints": {
"maxDepth": 4,
"maxRetries": 2
}
}
Minimal Engine Usage
import { SchemaDrivenExecutionEngine } from "./src/agents/orchestration.js";
const engine = new SchemaDrivenExecutionEngine({
manifest,
actorExecutors: {
coder_actor: async ({ prompt, context, toolClearance }) => {
// execute actor logic here
return {
status: "success",
payload: {
summary: "done"
},
stateFlags: {
implemented: true
}
};
}
},
settings: {
workspaceRoot: process.cwd(),
runtimeContext: {
repo: "ai_ops",
ticket: "AIOPS-123"
}
}
});
const result = await engine.runSession({
sessionId: "session-1",
initialPayload: {
task: "Implement feature"
}
});
console.log(result.records);
Stateless Handoffs and Context
The engine does not depend on conversational memory between nodes.
- Node inputs are written as handoff payloads to storage.
- Each node execution reads a fresh context snapshot from disk.
- Session state persists:
- flags
- metadata
- history events
Default state root is controlled by AGENT_STATE_ROOT.
Recursive Orchestration Contract
AgentManager.runRecursiveAgent(...) uses a strict two-phase fanout/fan-in model:
- Phase 1 (planner): agent execution returns either a terminal result or a fanout plan (
intents[]+aggregate(...)). - Parent tokens are released before children are scheduled, avoiding deadlocks even when
AGENT_MAX_CONCURRENT=1. - Children run in isolated deterministic session IDs (
<parent>_child_<index>), each with their ownAbortSignal. - Phase 2 (aggregator): once all children complete, the aggregate phase runs as a fresh invocation.
Optional child middleware hooks (allocateForChild, releaseForChild) let callers integrate provisioning/suballocation without coupling AgentManager to filesystem or git operations.
Resource Provisioning
The provisioning layer separates:
- Hard constraints: actual resource allocation enforced before run.
- Soft constraints: injected env vars, prompt sections, metadata, and discovery snapshot.
Built-in providers:
git-worktreeport-range
Runtime injection includes:
- Working directory override
- Injected env vars such as
AGENT_WORKTREE_PATH,AGENT_PORT_RANGE_START,AGENT_PORT_RANGE_END,AGENT_PORT_PRIMARY - Discovery file path via
AGENT_DISCOVERY_FILE
Hierarchical Suballocation
Parent sessions can suballocate resources for child sessions using:
ResourceProvisioningOrchestrator.provisionChildSession(...)buildChildResourceRequests(...)
Behavior:
- Child worktrees are placed under a deterministic parent-scoped root.
- Child port blocks are deterministically carved from the parent assigned range.
MCP Configuration
Use mcp.config.json to configure shared and provider-specific MCP servers.
MCP_CONFIG_PATHcontrols config location (default./mcp.config.json).- Shared server definitions are in
servers. - Provider overrides:
codex.mcp_serversclaude.mcpServers
- Handlers:
- built-in
context7 - built-in
claude-task-master - built-in
generic - custom handlers via
registerMcpHandler(...)
- built-in
See mcp.config.example.json for a complete template.
Environment Variables
Provider/Auth
CODEX_API_KEYOPENAI_API_KEYOPENAI_BASE_URLCODEX_SKIP_GIT_CHECKANTHROPIC_API_KEYCLAUDE_MODELCLAUDE_CODE_PATHMCP_CONFIG_PATH
Agent Manager Limits
AGENT_MAX_CONCURRENTAGENT_MAX_SESSIONAGENT_MAX_RECURSIVE_DEPTH
Orchestration Limits
AGENT_STATE_ROOTAGENT_TOPOLOGY_MAX_DEPTHAGENT_TOPOLOGY_MAX_RETRIESAGENT_RELATIONSHIP_MAX_CHILDREN
Provisioning
AGENT_WORKTREE_ROOTAGENT_WORKTREE_BASE_REFAGENT_PORT_BASEAGENT_PORT_BLOCK_SIZEAGENT_PORT_BLOCK_COUNTAGENT_PORT_PRIMARY_OFFSETAGENT_PORT_LOCK_DIRAGENT_DISCOVERY_FILE_RELATIVE_PATH
Defaults are documented in .env.example.
Quality Gate
Run the full pre-PR gate:
npm run verify
Equivalent individual commands:
npm run check
npm run check:tests
npm run test
npm run build
Build and Start
npm run build
npm run start -- codex "Hello from built JS"
Known Limitations
- Tool clearance allowlist/banlist is currently metadata; enforcement is not yet wired into an execution sandbox.
References
docs/orchestration-engine.md- OpenAI Codex SDK docs: https://developers.openai.com/codex/sdk/
- Codex MCP config docs: https://developers.openai.com/codex/config#model-context-protocol-mcp_servers
- Claude Agent SDK docs: https://platform.claude.com/docs/en/agent-sdk/overview