Wire pipeline DAG execution to manager with events and project context

2026-02-23 13:14:20 -05:00
parent 53af0d44cd
commit 889087daa1
13 changed files with 1668 additions and 380 deletions
--- a/README.md
+++ b/README.md
@@ -2,40 +2,41 @@

 TypeScript runtime for deterministic multi-agent execution with:

- OpenAI Codex SDK integration (`@openai/codex-sdk`)
- Anthropic Claude Agent SDK integration (`@anthropic-ai/claude-agent-sdk`)
+- OpenAI Codex SDK (`@openai/codex-sdk`)
+- Anthropic Claude Agent SDK (`@anthropic-ai/claude-agent-sdk`)
 - Schema-validated orchestration (`AgentManifest`)
- Stateless node handoffs via persisted state/context payloads
+- DAG execution with topology-aware fan-out (`parallel`, `hierarchical`, `retry-unrolled`)
+- Project-scoped persistent context store
+- Typed domain events for edge-triggered routing
 - Resource provisioning (git worktrees + deterministic port ranges)
- MCP configuration layer with handler-based policy hooks
+- MCP configuration layer with handler policy hooks

-## Current Status
+## Architecture Summary

- Provider entrypoints (`codex`, `claude`) run with session limits and resource provisioning.
- Schema-driven orchestration is implemented as reusable modules under `src/agents`.
- Recursive `AgentManager.runRecursiveAgent(...)` supports fanout/fan-in orchestration with abort propagation.
- Tool clearance allowlist/banlist is modeled, but hard security enforcement is still a TODO at tool execution boundaries.
+- `SchemaDrivenExecutionEngine.runSession(...)` is the single execution entrypoint.
+- `PipelineExecutor` owns runtime control flow and topology dispatch.
+- `AgentManager` is an internal utility used by the pipeline when fan-out/retry-unrolled behavior is required.
+- Session state is persisted under `AGENT_STATE_ROOT`.
+- Project state is persisted under `AGENT_PROJECT_CONTEXT_PATH` with domains:
+  - `globalFlags`
+  - `artifactPointers`
+  - `taskQueue`

 ## Repository Layout

- `src/agents`:
-  - `manager.ts`: queue-based concurrency limits + recursive fanout/fan-in orchestration.
-  - `runtime.ts`: env-driven runtime singletons and defaults.
-  - `manifest.ts`: `AgentManifest` schema parsing + validation (strict DAG).
-  - `persona-registry.ts`: prompt templating + persona behavior events.
-  - `pipeline.ts`: actor-oriented DAG runner with retries and state-dependent routing.
-  - `state-context.ts`: persisted state + stateless handoff reconstruction.
-  - `provisioning.ts`: extensible resource orchestration + child suballocation support.
-  - `orchestration.ts`: `SchemaDrivenExecutionEngine` facade.
- `src/mcp`: MCP config types, conversions, and handler resolution.
- `src/examples`: provider entrypoints (`codex.ts`, `claude.ts`).
- `tests`: unit coverage for manager, manifest, pipeline/orchestration, state context, MCP, and provisioning behavior.
- `docs/orchestration-engine.md`: design notes for the orchestration architecture.
-
-## Prerequisites
-
- Node.js 18+
- npm
+- `src/agents`
+  - `orchestration.ts`: engine facade and runtime wiring
+  - `pipeline.ts`: DAG runner, retry matrix, abort propagation, domain-event routing
+  - `manifest.ts`: schema parsing/validation for personas/topologies/edges
+  - `manager.ts`: recursive fan-out utility used by pipeline
+  - `state-context.ts`: persisted node handoffs + session state
+  - `project-context.ts`: project-scoped store
+  - `domain-events.ts`: typed domain event schema + bus
+  - `runtime.ts`: env-driven defaults/singletons
+  - `provisioning.ts`: resource provisioning and child suballocation helpers
+- `src/mcp`: MCP config types/conversion/handlers
+- `src/examples`: provider entrypoints (`codex.ts`, `claude.ts`)
+- `tests`: manager, manifest, pipeline/orchestration, state, provisioning, MCP

 ## Setup

@@ -45,207 +46,53 @@ cp .env.example .env
 cp mcp.config.example.json mcp.config.json
 ```

-Fill in any values you need in `.env`.
-
 ## Run

-Run Codex example:
-
 ```bash
-npm run codex -- "Summarize what this repository does."
+npm run codex -- "Summarize this repository."
+npm run claude -- "Summarize this repository."
 ```

-Run Claude example:
-
-```bash
-npm run claude -- "Summarize what this repository does."
-```
-
-Run via unified entrypoint:
+Or via unified entrypoint:

 ```bash
 npm run dev -- codex "List potential improvements."
 npm run dev -- claude "List potential improvements."
 ```

-## Schema-Driven Orchestration
+## Manifest Semantics

-The orchestration engine is exposed as library modules (not yet wired into `src/index.ts` by default).
+`AgentManifest` (schema `"1"`) validates:

-Core pieces:
+- supported topologies (`sequential`, `parallel`, `hierarchical`, `retry-unrolled`)
+- persona definitions and tool-clearance metadata
+- relationship DAG and unknown persona references
+- strict pipeline DAG
+- topology constraints (`maxDepth`, `maxRetries`)

- `parseAgentManifest(...)` validates the full orchestration schema.
- `PersonaRegistry` injects runtime context into templated system prompts.
- `PipelineExecutor` executes a strict DAG of actor nodes.
- `FileSystemStateContextManager` enforces stateless handoffs.
- `SchemaDrivenExecutionEngine` composes all of the above with env-driven limits.
+Pipeline edges can route via:

-### AgentManifest Overview
+- legacy status triggers (`on`: `success`, `validation_fail`, `failure`, `always`, ...)
+- domain event triggers (`event`: typed domain events)
+- conditions (`state_flag`, `history_has_event`, `file_exists`, `always`)

-`AgentManifest` (schema version `"1"`) includes:
+## Domain Events

- `topologies`: any of `hierarchical`, `retry-unrolled`, `sequential`
- `personas`: identity, prompt template, tool clearance metadata
- `relationships`: parent-child persona edges and constraints
- `pipeline`: strict DAG with entry node, nodes, and edges
- `topologyConstraints`: max depth and retry ceilings
+Domain events are typed and can trigger edges directly:

-Edge routing supports:
+- planning: `requirements_defined`, `tasks_planned`
+- execution: `code_committed`, `task_blocked`
+- validation: `validation_passed`, `validation_failed`
+- integration: `branch_merged`

- Event gates: `success`, `validation_fail`, `failure`, `always`, `onTaskComplete`, `onValidationFail`
- Conditions:
-  - `state_flag`
-  - `history_has_event`
-  - `file_exists`
-  - `always`
+Actors can emit events in `ActorExecutionResult.events`. Pipeline status also emits default validation/execution events.

-Example manifest:
+## Retry Matrix and Cancellation

-```json
-{
-  "schemaVersion": "1",
-  "topologies": ["hierarchical", "retry-unrolled", "sequential"],
-  "personas": [
-    {
-      "id": "coder",
-      "displayName": "Coder",
-      "systemPromptTemplate": "Implement ticket {{ticket}} in repo {{repo}}",
-      "toolClearance": {
-        "allowlist": ["read_file", "write_file"],
-        "banlist": ["rm"]
-      }
-    }
-  ],
-  "relationships": [],
-  "pipeline": {
-    "entryNodeId": "coder-1",
-    "nodes": [
-      {
-        "id": "coder-1",
-        "actorId": "coder_actor",
-        "personaId": "coder",
-        "constraints": { "maxRetries": 1 }
-      }
-    ],
-    "edges": []
-  },
-  "topologyConstraints": {
-    "maxDepth": 4,
-    "maxRetries": 2
-  }
-}
-```
-
-### Minimal Engine Usage
-
-```ts
-import { SchemaDrivenExecutionEngine } from "./src/agents/orchestration.js";
-
-const engine = new SchemaDrivenExecutionEngine({
-  manifest,
-  actorExecutors: {
-    coder_actor: async ({ prompt, context, toolClearance }) => {
-      // execute actor logic here
-      return {
-        status: "success",
-        payload: {
-          summary: "done"
-        },
-        stateFlags: {
-          implemented: true
-        }
-      };
-    }
-  },
-  settings: {
-    workspaceRoot: process.cwd(),
-    runtimeContext: {
-      repo: "ai_ops",
-      ticket: "AIOPS-123"
-    }
-  }
-});
-
-const result = await engine.runSession({
-  sessionId: "session-1",
-  initialPayload: {
-    task: "Implement feature"
-  }
-});
-
-console.log(result.records);
-```
-
-## Stateless Handoffs and Context
-
-The engine does not depend on conversational memory between nodes.
-
- Node inputs are written as handoff payloads to storage.
- Each node execution reads a fresh context snapshot from disk.
- Session state persists:
-  - flags
-  - metadata
-  - history events
-
-Default state root is controlled by `AGENT_STATE_ROOT`.
-
-## Recursive Orchestration Contract
-
-`AgentManager.runRecursiveAgent(...)` uses a strict two-phase fanout/fan-in model:
-
- Phase 1 (planner): agent execution returns either a terminal result or a fanout plan (`intents[]` + `aggregate(...)`).
- Parent tokens are released before children are scheduled, avoiding deadlocks even when `AGENT_MAX_CONCURRENT=1`.
- Children run in isolated deterministic session IDs (`<parent>_child_<index>`), each with their own `AbortSignal`.
- Phase 2 (aggregator): once all children complete, the aggregate phase runs as a fresh invocation.
-
-Optional child middleware hooks (`allocateForChild`, `releaseForChild`) let callers integrate provisioning/suballocation without coupling `AgentManager` to filesystem or git operations.
-
-## Resource Provisioning
-
-The provisioning layer separates:
-
- Hard constraints: actual resource allocation enforced before run.
- Soft constraints: injected env vars, prompt sections, metadata, and discovery snapshot.
-
-Built-in providers:
-
- `git-worktree`
- `port-range`
-
-Runtime injection includes:
-
- Working directory override
- Injected env vars such as `AGENT_WORKTREE_PATH`, `AGENT_PORT_RANGE_START`, `AGENT_PORT_RANGE_END`, `AGENT_PORT_PRIMARY`
- Discovery file path via `AGENT_DISCOVERY_FILE`
-
-### Hierarchical Suballocation
-
-Parent sessions can suballocate resources for child sessions using:
-
- `ResourceProvisioningOrchestrator.provisionChildSession(...)`
- `buildChildResourceRequests(...)`
-
-Behavior:
-
- Child worktrees are placed under a deterministic parent-scoped root.
- Child port blocks are deterministically carved from the parent assigned range.
-
-## MCP Configuration
-
-Use `mcp.config.json` to configure shared and provider-specific MCP servers.
-
- `MCP_CONFIG_PATH` controls config location (default `./mcp.config.json`).
- Shared server definitions are in `servers`.
- Provider overrides:
-  - `codex.mcp_servers`
-  - `claude.mcpServers`
- Handlers:
-  - built-in `context7`
-  - built-in `claude-task-master`
-  - built-in `generic`
-  - custom handlers via `registerMcpHandler(...)`
-
-See `mcp.config.example.json` for a complete template.
+- `validation_fail`: routed through retry-unrolled execution (new child manager session)
+- hard failures: timeout/network/403-like failures tracked sequentially; at 2 consecutive hard failures the pipeline aborts fast
+- `AbortSignal` is passed into every actor execution input
+- session closure aborts child recursive work

 ## Environment Variables

@@ -266,14 +113,15 @@ See `mcp.config.example.json` for a complete template.
 - `AGENT_MAX_SESSION`
 - `AGENT_MAX_RECURSIVE_DEPTH`

-### Orchestration Limits
+### Orchestration / Context

 - `AGENT_STATE_ROOT`
+- `AGENT_PROJECT_CONTEXT_PATH`
 - `AGENT_TOPOLOGY_MAX_DEPTH`
 - `AGENT_TOPOLOGY_MAX_RETRIES`
 - `AGENT_RELATIONSHIP_MAX_CHILDREN`

-### Provisioning
+### Provisioning / Resource Controls

 - `AGENT_WORKTREE_ROOT`
 - `AGENT_WORKTREE_BASE_REF`
@@ -288,13 +136,11 @@ Defaults are documented in `.env.example`.

 ## Quality Gate

-Run the full pre-PR gate:
-
 ```bash
 npm run verify
 ```

-Equivalent individual commands:
+Equivalent:

 ```bash
 npm run check
@@ -303,20 +149,7 @@ npm run test
 npm run build
 ```

-## Build and Start
+## Notes

-```bash
-npm run build
-npm run start -- codex "Hello from built JS"
-```
-
-## Known Limitations
-
- Tool clearance allowlist/banlist is currently metadata; enforcement is not yet wired into an execution sandbox.
-
-## References
-
- `docs/orchestration-engine.md`
- OpenAI Codex SDK docs: https://developers.openai.com/codex/sdk/
- Codex MCP config docs: https://developers.openai.com/codex/config#model-context-protocol-mcp_servers
- Claude Agent SDK docs: https://platform.claude.com/docs/en/agent-sdk/overview
+- Tool clearance allowlist/banlist is currently metadata only; hard enforcement must happen at the tool execution boundary.
+- `AgentManager.runRecursiveAgent(...)` remains available for low-level testing, but pipeline execution should use `SchemaDrivenExecutionEngine.runSession(...)`.