first commit

2026-02-23 12:06:13 -05:00
commit 53af0d44cd
33 changed files with 6483 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,322 @@
+# AI Ops: Schema-Driven Multi-Agent Orchestration Runtime
+
+TypeScript runtime for deterministic multi-agent execution with:
+
+- OpenAI Codex SDK integration (`@openai/codex-sdk`)
+- Anthropic Claude Agent SDK integration (`@anthropic-ai/claude-agent-sdk`)
+- Schema-validated orchestration (`AgentManifest`)
+- Stateless node handoffs via persisted state/context payloads
+- Resource provisioning (git worktrees + deterministic port ranges)
+- MCP configuration layer with handler-based policy hooks
+
+## Current Status
+
+- Provider entrypoints (`codex`, `claude`) run with session limits and resource provisioning.
+- Schema-driven orchestration is implemented as reusable modules under `src/agents`.
+- Recursive `AgentManager.runRecursiveAgent(...)` supports fanout/fan-in orchestration with abort propagation.
+- Tool clearance allowlist/banlist is modeled, but hard security enforcement is still a TODO at tool execution boundaries.
+
+## Repository Layout
+
+- `src/agents`:
+  - `manager.ts`: queue-based concurrency limits + recursive fanout/fan-in orchestration.
+  - `runtime.ts`: env-driven runtime singletons and defaults.
+  - `manifest.ts`: `AgentManifest` schema parsing + validation (strict DAG).
+  - `persona-registry.ts`: prompt templating + persona behavior events.
+  - `pipeline.ts`: actor-oriented DAG runner with retries and state-dependent routing.
+  - `state-context.ts`: persisted state + stateless handoff reconstruction.
+  - `provisioning.ts`: extensible resource orchestration + child suballocation support.
+  - `orchestration.ts`: `SchemaDrivenExecutionEngine` facade.
+- `src/mcp`: MCP config types, conversions, and handler resolution.
+- `src/examples`: provider entrypoints (`codex.ts`, `claude.ts`).
+- `tests`: unit coverage for manager, manifest, pipeline/orchestration, state context, MCP, and provisioning behavior.
+- `docs/orchestration-engine.md`: design notes for the orchestration architecture.
+
+## Prerequisites
+
+- Node.js 18+
+- npm
+
+## Setup
+
+```bash
+npm install
+cp .env.example .env
+cp mcp.config.example.json mcp.config.json
+```
+
+Fill in any values you need in `.env`.
+
+## Run
+
+Run Codex example:
+
+```bash
+npm run codex -- "Summarize what this repository does."
+```
+
+Run Claude example:
+
+```bash
+npm run claude -- "Summarize what this repository does."
+```
+
+Run via unified entrypoint:
+
+```bash
+npm run dev -- codex "List potential improvements."
+npm run dev -- claude "List potential improvements."
+```
+
+## Schema-Driven Orchestration
+
+The orchestration engine is exposed as library modules (not yet wired into `src/index.ts` by default).
+
+Core pieces:
+
+- `parseAgentManifest(...)` validates the full orchestration schema.
+- `PersonaRegistry` injects runtime context into templated system prompts.
+- `PipelineExecutor` executes a strict DAG of actor nodes.
+- `FileSystemStateContextManager` enforces stateless handoffs.
+- `SchemaDrivenExecutionEngine` composes all of the above with env-driven limits.
+
+### AgentManifest Overview
+
+`AgentManifest` (schema version `"1"`) includes:
+
+- `topologies`: any of `hierarchical`, `retry-unrolled`, `sequential`
+- `personas`: identity, prompt template, tool clearance metadata
+- `relationships`: parent-child persona edges and constraints
+- `pipeline`: strict DAG with entry node, nodes, and edges
+- `topologyConstraints`: max depth and retry ceilings
+
+Edge routing supports:
+
+- Event gates: `success`, `validation_fail`, `failure`, `always`, `onTaskComplete`, `onValidationFail`
+- Conditions:
+  - `state_flag`
+  - `history_has_event`
+  - `file_exists`
+  - `always`
+
+Example manifest:
+
+```json
+{
+  "schemaVersion": "1",
+  "topologies": ["hierarchical", "retry-unrolled", "sequential"],
+  "personas": [
+    {
+      "id": "coder",
+      "displayName": "Coder",
+      "systemPromptTemplate": "Implement ticket {{ticket}} in repo {{repo}}",
+      "toolClearance": {
+        "allowlist": ["read_file", "write_file"],
+        "banlist": ["rm"]
+      }
+    }
+  ],
+  "relationships": [],
+  "pipeline": {
+    "entryNodeId": "coder-1",
+    "nodes": [
+      {
+        "id": "coder-1",
+        "actorId": "coder_actor",
+        "personaId": "coder",
+        "constraints": { "maxRetries": 1 }
+      }
+    ],
+    "edges": []
+  },
+  "topologyConstraints": {
+    "maxDepth": 4,
+    "maxRetries": 2
+  }
+}
+```
+
+### Minimal Engine Usage
+
+```ts
+import { SchemaDrivenExecutionEngine } from "./src/agents/orchestration.js";
+
+const engine = new SchemaDrivenExecutionEngine({
+  manifest,
+  actorExecutors: {
+    coder_actor: async ({ prompt, context, toolClearance }) => {
+      // execute actor logic here
+      return {
+        status: "success",
+        payload: {
+          summary: "done"
+        },
+        stateFlags: {
+          implemented: true
+        }
+      };
+    }
+  },
+  settings: {
+    workspaceRoot: process.cwd(),
+    runtimeContext: {
+      repo: "ai_ops",
+      ticket: "AIOPS-123"
+    }
+  }
+});
+
+const result = await engine.runSession({
+  sessionId: "session-1",
+  initialPayload: {
+    task: "Implement feature"
+  }
+});
+
+console.log(result.records);
+```
+
+## Stateless Handoffs and Context
+
+The engine does not depend on conversational memory between nodes.
+
+- Node inputs are written as handoff payloads to storage.
+- Each node execution reads a fresh context snapshot from disk.
+- Session state persists:
+  - flags
+  - metadata
+  - history events
+
+Default state root is controlled by `AGENT_STATE_ROOT`.
+
+## Recursive Orchestration Contract
+
+`AgentManager.runRecursiveAgent(...)` uses a strict two-phase fanout/fan-in model:
+
+- Phase 1 (planner): agent execution returns either a terminal result or a fanout plan (`intents[]` + `aggregate(...)`).
+- Parent tokens are released before children are scheduled, avoiding deadlocks even when `AGENT_MAX_CONCURRENT=1`.
+- Children run in isolated deterministic session IDs (`<parent>_child_<index>`), each with their own `AbortSignal`.
+- Phase 2 (aggregator): once all children complete, the aggregate phase runs as a fresh invocation.
+
+Optional child middleware hooks (`allocateForChild`, `releaseForChild`) let callers integrate provisioning/suballocation without coupling `AgentManager` to filesystem or git operations.
+
+## Resource Provisioning
+
+The provisioning layer separates:
+
+- Hard constraints: actual resource allocation enforced before run.
+- Soft constraints: injected env vars, prompt sections, metadata, and discovery snapshot.
+
+Built-in providers:
+
+- `git-worktree`
+- `port-range`
+
+Runtime injection includes:
+
+- Working directory override
+- Injected env vars such as `AGENT_WORKTREE_PATH`, `AGENT_PORT_RANGE_START`, `AGENT_PORT_RANGE_END`, `AGENT_PORT_PRIMARY`
+- Discovery file path via `AGENT_DISCOVERY_FILE`
+
+### Hierarchical Suballocation
+
+Parent sessions can suballocate resources for child sessions using:
+
+- `ResourceProvisioningOrchestrator.provisionChildSession(...)`
+- `buildChildResourceRequests(...)`
+
+Behavior:
+
+- Child worktrees are placed under a deterministic parent-scoped root.
+- Child port blocks are deterministically carved from the parent assigned range.
+
+## MCP Configuration
+
+Use `mcp.config.json` to configure shared and provider-specific MCP servers.
+
+- `MCP_CONFIG_PATH` controls config location (default `./mcp.config.json`).
+- Shared server definitions are in `servers`.
+- Provider overrides:
+  - `codex.mcp_servers`
+  - `claude.mcpServers`
+- Handlers:
+  - built-in `context7`
+  - built-in `claude-task-master`
+  - built-in `generic`
+  - custom handlers via `registerMcpHandler(...)`
+
+See `mcp.config.example.json` for a complete template.
+
+## Environment Variables
+
+### Provider/Auth
+
+- `CODEX_API_KEY`
+- `OPENAI_API_KEY`
+- `OPENAI_BASE_URL`
+- `CODEX_SKIP_GIT_CHECK`
+- `ANTHROPIC_API_KEY`
+- `CLAUDE_MODEL`
+- `CLAUDE_CODE_PATH`
+- `MCP_CONFIG_PATH`
+
+### Agent Manager Limits
+
+- `AGENT_MAX_CONCURRENT`
+- `AGENT_MAX_SESSION`
+- `AGENT_MAX_RECURSIVE_DEPTH`
+
+### Orchestration Limits
+
+- `AGENT_STATE_ROOT`
+- `AGENT_TOPOLOGY_MAX_DEPTH`
+- `AGENT_TOPOLOGY_MAX_RETRIES`
+- `AGENT_RELATIONSHIP_MAX_CHILDREN`
+
+### Provisioning
+
+- `AGENT_WORKTREE_ROOT`
+- `AGENT_WORKTREE_BASE_REF`
+- `AGENT_PORT_BASE`
+- `AGENT_PORT_BLOCK_SIZE`
+- `AGENT_PORT_BLOCK_COUNT`
+- `AGENT_PORT_PRIMARY_OFFSET`
+- `AGENT_PORT_LOCK_DIR`
+- `AGENT_DISCOVERY_FILE_RELATIVE_PATH`
+
+Defaults are documented in `.env.example`.
+
+## Quality Gate
+
+Run the full pre-PR gate:
+
+```bash
+npm run verify
+```
+
+Equivalent individual commands:
+
+```bash
+npm run check
+npm run check:tests
+npm run test
+npm run build
+```
+
+## Build and Start
+
+```bash
+npm run build
+npm run start -- codex "Hello from built JS"
+```
+
+## Known Limitations
+
+- Tool clearance allowlist/banlist is currently metadata; enforcement is not yet wired into an execution sandbox.
+
+## References
+
+- `docs/orchestration-engine.md`
+- OpenAI Codex SDK docs: https://developers.openai.com/codex/sdk/
+- Codex MCP config docs: https://developers.openai.com/codex/config#model-context-protocol-mcp_servers
+- Claude Agent SDK docs: https://platform.claude.com/docs/en/agent-sdk/overview