6.5 KiB
6.5 KiB
AI Ops: Schema-Driven Multi-Agent Orchestration Runtime
TypeScript runtime for deterministic multi-agent execution with:
- OpenAI Codex SDK (
@openai/codex-sdk) - Anthropic Claude Agent SDK (
@anthropic-ai/claude-agent-sdk) - Schema-validated orchestration (
AgentManifest) - DAG execution with topology-aware fan-out (
parallel,hierarchical,retry-unrolled) - Project-scoped persistent context store
- Typed domain events for edge-triggered routing
- Resource provisioning (git worktrees + deterministic port ranges)
- MCP configuration layer with handler policy hooks
- Security middleware for shell/tool policy enforcement
Architecture Summary
SchemaDrivenExecutionEngine.runSession(...)is the single execution entrypoint.PipelineExecutorowns runtime control flow and topology dispatch while delegating failure classification and persistence/event side-effects to dedicated policies.AgentManageris an internal utility used by the pipeline when fan-out/retry-unrolled behavior is required.- Session state is persisted under
AGENT_STATE_ROOT. - Project state is persisted under
AGENT_PROJECT_CONTEXT_PATHwith schema-versioned JSON (schemaVersion) and domains:globalFlagsartifactPointerstaskQueue
Repository Layout
src/agentsorchestration.ts: engine facade and runtime wiringpipeline.ts: DAG runner, retry matrix, aggregate session status, abort propagation, domain-event routingfailure-policy.ts: hard/soft failure classification policylifecycle-observer.ts: persistence/event lifecycle hooks for node attemptsmanifest.ts: schema parsing/validation for personas/topologies/edgesmanager.ts: recursive fan-out utility used by pipelinestate-context.ts: persisted node handoffs + session stateproject-context.ts: project-scoped storedomain-events.ts: typed domain event schema + busruntime.ts: env-driven defaults/singletonsprovisioning.ts: resource provisioning and child suballocation helpers
src/mcp: MCP config types/conversion/handlerssrc/security: shell AST parsing, rules engine, secure executor, and audit sinkssrc/examples: provider entrypoints (codex.ts,claude.ts)src/config.ts: centralized env parsing/validation/defaultingtests: manager, manifest, pipeline/orchestration, state, provisioning, MCP
Setup
npm install
cp .env.example .env
cp mcp.config.example.json mcp.config.json
Run
npm run codex -- "Summarize this repository."
npm run claude -- "Summarize this repository."
Or via unified entrypoint:
npm run dev -- codex "List potential improvements."
npm run dev -- claude "List potential improvements."
Manifest Semantics
AgentManifest (schema "1") validates:
- supported topologies (
sequential,parallel,hierarchical,retry-unrolled) - persona definitions and tool-clearance metadata
- persona definitions and tool-clearance policy (validated by shared Zod schema)
- relationship DAG and unknown persona references
- strict pipeline DAG
- topology constraints (
maxDepth,maxRetries)
Pipeline edges can route via:
- legacy status triggers (
on:success,validation_fail,failure,always, ...) - domain event triggers (
event: typed domain events) - conditions (
state_flag,history_has_event,file_exists,always)
Domain Events
Domain events are typed and can trigger edges directly:
- planning:
requirements_defined,tasks_planned - execution:
code_committed,task_blocked - validation:
validation_passed,validation_failed - integration:
branch_merged
Actors can emit events in ActorExecutionResult.events. Pipeline status also emits default validation/execution events.
Retry Matrix and Cancellation
validation_fail: routed through retry-unrolled execution (new child manager session)- hard failures: timeout/network/403-like failures tracked sequentially; at 2 consecutive hard failures the pipeline aborts fast
AbortSignalis passed into every actor execution input- session closure aborts child recursive work
- run summaries expose aggregate
status: success requires successful terminal executed DAG nodes and no critical-path failure
Security Middleware
- Shell command parsing uses
bash-parserAST traversal and extractsCommand/Wordnodes. - Rules are validated with strict Zod schemas (
src/security/schemas.ts) before execution. SecurityRulesEngineenforces:- binary allowlists
- cwd/worktree boundary checks
- path traversal blocking (
../) - protected path blocking (state root + project context path)
- unified tool allowlist/banlist checks for shell binaries and MCP tool lists
SecureCommandExecutorruns commands viachild_process.spawnwith:- explicit env scrub/inject policy (no implicit full env inheritance)
- timeout enforcement
- optional uid/gid drop
- stdout/stderr streaming hooks for audit
- Every actor execution input now includes
securityhelpers (rulesEngine,createCommandExecutor(...)) so executors can enforce shell/tool policy at the execution boundary. - Pipeline behavior on
SecurityViolationErroris configurable:hard_abort(default)validation_fail(retry-unrolled remediation)
Environment Variables
Provider/Auth
CODEX_API_KEYOPENAI_API_KEYOPENAI_BASE_URLCODEX_SKIP_GIT_CHECKANTHROPIC_API_KEYCLAUDE_MODELCLAUDE_CODE_PATHMCP_CONFIG_PATH
Agent Manager Limits
AGENT_MAX_CONCURRENTAGENT_MAX_SESSIONAGENT_MAX_RECURSIVE_DEPTH
Orchestration / Context
AGENT_STATE_ROOTAGENT_PROJECT_CONTEXT_PATHAGENT_TOPOLOGY_MAX_DEPTHAGENT_TOPOLOGY_MAX_RETRIESAGENT_RELATIONSHIP_MAX_CHILDREN
Provisioning / Resource Controls
AGENT_WORKTREE_ROOTAGENT_WORKTREE_BASE_REFAGENT_PORT_BASEAGENT_PORT_BLOCK_SIZEAGENT_PORT_BLOCK_COUNTAGENT_PORT_PRIMARY_OFFSETAGENT_PORT_LOCK_DIRAGENT_DISCOVERY_FILE_RELATIVE_PATH
Security Middleware
AGENT_SECURITY_VIOLATION_MODEAGENT_SECURITY_ALLOWED_BINARIESAGENT_SECURITY_COMMAND_TIMEOUT_MSAGENT_SECURITY_AUDIT_LOG_PATHAGENT_SECURITY_ENV_INHERITAGENT_SECURITY_ENV_SCRUBAGENT_SECURITY_DROP_UIDAGENT_SECURITY_DROP_GID
Defaults are documented in .env.example.
Quality Gate
npm run verify
Equivalent:
npm run check
npm run check:tests
npm run test
npm run build
Notes
AgentManager.runRecursiveAgent(...)remains available for low-level testing, but pipeline execution should useSchemaDrivenExecutionEngine.runSession(...).