zman27/ai_ops

Fork 0

Files

Josh Rzemien 889087daa1 Wire pipeline DAG execution to manager with events and project context

2026-02-23 13:14:20 -05:00

4.6 KiB

Raw Blame History

AI Ops: Schema-Driven Multi-Agent Orchestration Runtime

TypeScript runtime for deterministic multi-agent execution with:

OpenAI Codex SDK (@openai/codex-sdk)
Anthropic Claude Agent SDK (@anthropic-ai/claude-agent-sdk)
Schema-validated orchestration (AgentManifest)
DAG execution with topology-aware fan-out (parallel, hierarchical, retry-unrolled)
Project-scoped persistent context store
Typed domain events for edge-triggered routing
Resource provisioning (git worktrees + deterministic port ranges)
MCP configuration layer with handler policy hooks

Architecture Summary

SchemaDrivenExecutionEngine.runSession(...) is the single execution entrypoint.
PipelineExecutor owns runtime control flow and topology dispatch.
AgentManager is an internal utility used by the pipeline when fan-out/retry-unrolled behavior is required.
Session state is persisted under AGENT_STATE_ROOT.
Project state is persisted under AGENT_PROJECT_CONTEXT_PATH with domains:
- globalFlags
- artifactPointers
- taskQueue

Repository Layout

src/agents
- orchestration.ts: engine facade and runtime wiring
- pipeline.ts: DAG runner, retry matrix, abort propagation, domain-event routing
- manifest.ts: schema parsing/validation for personas/topologies/edges
- manager.ts: recursive fan-out utility used by pipeline
- state-context.ts: persisted node handoffs + session state
- project-context.ts: project-scoped store
- domain-events.ts: typed domain event schema + bus
- runtime.ts: env-driven defaults/singletons
- provisioning.ts: resource provisioning and child suballocation helpers
src/mcp: MCP config types/conversion/handlers
src/examples: provider entrypoints (codex.ts, claude.ts)
tests: manager, manifest, pipeline/orchestration, state, provisioning, MCP

Setup

npm install
cp .env.example .env
cp mcp.config.example.json mcp.config.json

Run

npm run codex -- "Summarize this repository."
npm run claude -- "Summarize this repository."

Or via unified entrypoint:

npm run dev -- codex "List potential improvements."
npm run dev -- claude "List potential improvements."

Manifest Semantics

AgentManifest (schema "1") validates:

supported topologies (sequential, parallel, hierarchical, retry-unrolled)
persona definitions and tool-clearance metadata
relationship DAG and unknown persona references
strict pipeline DAG
topology constraints (maxDepth, maxRetries)

Pipeline edges can route via:

legacy status triggers (on: success, validation_fail, failure, always, ...)
domain event triggers (event: typed domain events)
conditions (state_flag, history_has_event, file_exists, always)

Domain Events

Domain events are typed and can trigger edges directly:

planning: requirements_defined, tasks_planned
execution: code_committed, task_blocked
validation: validation_passed, validation_failed
integration: branch_merged

Actors can emit events in ActorExecutionResult.events. Pipeline status also emits default validation/execution events.

Retry Matrix and Cancellation

validation_fail: routed through retry-unrolled execution (new child manager session)
hard failures: timeout/network/403-like failures tracked sequentially; at 2 consecutive hard failures the pipeline aborts fast
AbortSignal is passed into every actor execution input
session closure aborts child recursive work

Environment Variables

Provider/Auth

CODEX_API_KEY
OPENAI_API_KEY
OPENAI_BASE_URL
CODEX_SKIP_GIT_CHECK
ANTHROPIC_API_KEY
CLAUDE_MODEL
CLAUDE_CODE_PATH
MCP_CONFIG_PATH

Agent Manager Limits

AGENT_MAX_CONCURRENT
AGENT_MAX_SESSION
AGENT_MAX_RECURSIVE_DEPTH

Orchestration / Context

AGENT_STATE_ROOT
AGENT_PROJECT_CONTEXT_PATH
AGENT_TOPOLOGY_MAX_DEPTH
AGENT_TOPOLOGY_MAX_RETRIES
AGENT_RELATIONSHIP_MAX_CHILDREN

Provisioning / Resource Controls

AGENT_WORKTREE_ROOT
AGENT_WORKTREE_BASE_REF
AGENT_PORT_BASE
AGENT_PORT_BLOCK_SIZE
AGENT_PORT_BLOCK_COUNT
AGENT_PORT_PRIMARY_OFFSET
AGENT_PORT_LOCK_DIR
AGENT_DISCOVERY_FILE_RELATIVE_PATH

Defaults are documented in .env.example.

Quality Gate

npm run verify

Equivalent:

npm run check
npm run check:tests
npm run test
npm run build

Notes

Tool clearance allowlist/banlist is currently metadata only; hard enforcement must happen at the tool execution boundary.
AgentManager.runRecursiveAgent(...) remains available for low-level testing, but pipeline execution should use SchemaDrivenExecutionEngine.runSession(...).

4.6 KiB Raw Blame History