Files
ai_ops/docs/orchestration-engine.md

4.7 KiB

Schema-Driven Orchestration Engine

Why this exists

The orchestration runtime introduces explicit schema validation and deterministic execution rules for multi-agent pipelines. The design favors predictable behavior over implicit conversational memory.

Main components

  • AgentManifest schema (src/agents/manifest.ts): validates personas, relationships, topology constraints, and a strict DAG pipeline.
  • Persona registry (src/agents/persona-registry.ts): renders templated prompts with runtime context and routes behavioral events.
  • Stateful storage for stateless execution (src/agents/state-context.ts): each node execution reads payload + state from storage to get fresh context.
  • DAG pipeline runner (src/agents/pipeline.ts): executes topology blocks, emits typed domain events, evaluates route conditions, and enforces retry/depth/failure limits.
  • Project context store (src/agents/project-context.ts): project-scoped global flags, artifact pointers, and task queue persisted across sessions.
  • Orchestration facade (src/agents/orchestration.ts): wires manifest + registry + pipeline + state manager + project context with env-driven limits.
  • Hierarchical resource suballocation (src/agents/provisioning.ts): builds child git-worktree and child port-range requests from parent allocation data.
    • Optional AGENT_WORKTREE_TARGET_PATH enables sparse-checkout for a subdirectory and sets per-session working directory to that target path.
  • Recursive manager runtime (src/agents/manager.ts): utility invoked by the pipeline engine for fan-out/retry-unrolled execution.

Constraint model

  • Relationship constraints: per-edge limits (maxDepth, maxChildren) and process-level cap (AGENT_RELATIONSHIP_MAX_CHILDREN).
  • Pipeline constraints: per-node retry limits, retry-unrolled topology, and process-level cap (AGENT_TOPOLOGY_MAX_RETRIES).
  • Topology constraints: max depth and retries from manifest + env caps.

Stateless handoffs

Node payloads are persisted under the state root. Nodes do not inherit in-memory conversational context from previous node runs. Fresh context is reconstructed from the handoff and persisted state each execution. Sessions load project context from AGENT_PROJECT_CONTEXT_PATH at initialization, and orchestration writes project updates on each node completion.

Resolved execution contract

Before each actor invocation, orchestration resolves an immutable ResolvedExecutionContext and injects it into the executor input:

  • phase: current pipeline node id
  • modelConstraint: persona-level model policy (or runtime fallback)
  • allowedTools: flat resolved tool list for that node attempt
  • security: hard runtime constraints (dropUid, dropGid, worktreePath, violation handling mode)

This keeps orchestration policy resolution separate from executor enforcement. Executors do not need to parse manifests or MCP registry internals.

Execution topology model

  • Pipeline graph execution is DAG-based with ready-node frontiers.
  • Nodes tagged with topology blocks parallel/hierarchical are dispatched concurrently (Promise.all) through AgentManager.
  • Validation failures follow retry-unrolled behavior and are executed as new manager child sessions.
  • Sequential hard failures (timeout/network/403-like) trigger fail-fast abort.
  • AbortSignal is passed through actor execution input for immediate cancellation propagation.

Domain events

  • Domain event schema is strongly typed (src/agents/domain-events.ts).
  • Standard event domains:
    • planning: requirements_defined, tasks_planned
    • execution: code_committed, task_blocked
    • validation: validation_passed, validation_failed
    • integration: branch_merged
  • Pipeline edges can trigger on domain events (edge.event) in addition to legacy status triggers (edge.on).
  • history_has_event route conditions evaluate persisted domain event history entries (validation_failed, task_blocked, etc.).

Security note

Security enforcement now lives in src/security:

  • bash-parser AST parsing for shell command tokenization (Command/Word nodes).
  • Zod-validated shell/tool policy schemas.
  • SecurityRulesEngine for binary allowlists, path traversal checks, worktree boundaries, and tool clearance checks.
  • SecureCommandExecutor for controlled child_process execution with timeout + explicit env policy.
  • ResolvedExecutionContext.allowedTools is used to filter provider-exposed tools before SDK invocation, including Claude-specific tool gating where shared enabled_tools is ignored.

PipelineExecutor treats SecurityViolationError via configurable policy:

  • hard_abort (default): immediate pipeline termination.
  • validation_fail: maps to retry-unrolled remediation.