first commit

This commit is contained in:
2026-02-23 12:06:13 -05:00
commit 53af0d44cd
33 changed files with 6483 additions and 0 deletions

322
README.md Normal file
View File

@@ -0,0 +1,322 @@
# AI Ops: Schema-Driven Multi-Agent Orchestration Runtime
TypeScript runtime for deterministic multi-agent execution with:
- OpenAI Codex SDK integration (`@openai/codex-sdk`)
- Anthropic Claude Agent SDK integration (`@anthropic-ai/claude-agent-sdk`)
- Schema-validated orchestration (`AgentManifest`)
- Stateless node handoffs via persisted state/context payloads
- Resource provisioning (git worktrees + deterministic port ranges)
- MCP configuration layer with handler-based policy hooks
## Current Status
- Provider entrypoints (`codex`, `claude`) run with session limits and resource provisioning.
- Schema-driven orchestration is implemented as reusable modules under `src/agents`.
- Recursive `AgentManager.runRecursiveAgent(...)` supports fanout/fan-in orchestration with abort propagation.
- Tool clearance allowlist/banlist is modeled, but hard security enforcement is still a TODO at tool execution boundaries.
## Repository Layout
- `src/agents`:
- `manager.ts`: queue-based concurrency limits + recursive fanout/fan-in orchestration.
- `runtime.ts`: env-driven runtime singletons and defaults.
- `manifest.ts`: `AgentManifest` schema parsing + validation (strict DAG).
- `persona-registry.ts`: prompt templating + persona behavior events.
- `pipeline.ts`: actor-oriented DAG runner with retries and state-dependent routing.
- `state-context.ts`: persisted state + stateless handoff reconstruction.
- `provisioning.ts`: extensible resource orchestration + child suballocation support.
- `orchestration.ts`: `SchemaDrivenExecutionEngine` facade.
- `src/mcp`: MCP config types, conversions, and handler resolution.
- `src/examples`: provider entrypoints (`codex.ts`, `claude.ts`).
- `tests`: unit coverage for manager, manifest, pipeline/orchestration, state context, MCP, and provisioning behavior.
- `docs/orchestration-engine.md`: design notes for the orchestration architecture.
## Prerequisites
- Node.js 18+
- npm
## Setup
```bash
npm install
cp .env.example .env
cp mcp.config.example.json mcp.config.json
```
Fill in any values you need in `.env`.
## Run
Run Codex example:
```bash
npm run codex -- "Summarize what this repository does."
```
Run Claude example:
```bash
npm run claude -- "Summarize what this repository does."
```
Run via unified entrypoint:
```bash
npm run dev -- codex "List potential improvements."
npm run dev -- claude "List potential improvements."
```
## Schema-Driven Orchestration
The orchestration engine is exposed as library modules (not yet wired into `src/index.ts` by default).
Core pieces:
- `parseAgentManifest(...)` validates the full orchestration schema.
- `PersonaRegistry` injects runtime context into templated system prompts.
- `PipelineExecutor` executes a strict DAG of actor nodes.
- `FileSystemStateContextManager` enforces stateless handoffs.
- `SchemaDrivenExecutionEngine` composes all of the above with env-driven limits.
### AgentManifest Overview
`AgentManifest` (schema version `"1"`) includes:
- `topologies`: any of `hierarchical`, `retry-unrolled`, `sequential`
- `personas`: identity, prompt template, tool clearance metadata
- `relationships`: parent-child persona edges and constraints
- `pipeline`: strict DAG with entry node, nodes, and edges
- `topologyConstraints`: max depth and retry ceilings
Edge routing supports:
- Event gates: `success`, `validation_fail`, `failure`, `always`, `onTaskComplete`, `onValidationFail`
- Conditions:
- `state_flag`
- `history_has_event`
- `file_exists`
- `always`
Example manifest:
```json
{
"schemaVersion": "1",
"topologies": ["hierarchical", "retry-unrolled", "sequential"],
"personas": [
{
"id": "coder",
"displayName": "Coder",
"systemPromptTemplate": "Implement ticket {{ticket}} in repo {{repo}}",
"toolClearance": {
"allowlist": ["read_file", "write_file"],
"banlist": ["rm"]
}
}
],
"relationships": [],
"pipeline": {
"entryNodeId": "coder-1",
"nodes": [
{
"id": "coder-1",
"actorId": "coder_actor",
"personaId": "coder",
"constraints": { "maxRetries": 1 }
}
],
"edges": []
},
"topologyConstraints": {
"maxDepth": 4,
"maxRetries": 2
}
}
```
### Minimal Engine Usage
```ts
import { SchemaDrivenExecutionEngine } from "./src/agents/orchestration.js";
const engine = new SchemaDrivenExecutionEngine({
manifest,
actorExecutors: {
coder_actor: async ({ prompt, context, toolClearance }) => {
// execute actor logic here
return {
status: "success",
payload: {
summary: "done"
},
stateFlags: {
implemented: true
}
};
}
},
settings: {
workspaceRoot: process.cwd(),
runtimeContext: {
repo: "ai_ops",
ticket: "AIOPS-123"
}
}
});
const result = await engine.runSession({
sessionId: "session-1",
initialPayload: {
task: "Implement feature"
}
});
console.log(result.records);
```
## Stateless Handoffs and Context
The engine does not depend on conversational memory between nodes.
- Node inputs are written as handoff payloads to storage.
- Each node execution reads a fresh context snapshot from disk.
- Session state persists:
- flags
- metadata
- history events
Default state root is controlled by `AGENT_STATE_ROOT`.
## Recursive Orchestration Contract
`AgentManager.runRecursiveAgent(...)` uses a strict two-phase fanout/fan-in model:
- Phase 1 (planner): agent execution returns either a terminal result or a fanout plan (`intents[]` + `aggregate(...)`).
- Parent tokens are released before children are scheduled, avoiding deadlocks even when `AGENT_MAX_CONCURRENT=1`.
- Children run in isolated deterministic session IDs (`<parent>_child_<index>`), each with their own `AbortSignal`.
- Phase 2 (aggregator): once all children complete, the aggregate phase runs as a fresh invocation.
Optional child middleware hooks (`allocateForChild`, `releaseForChild`) let callers integrate provisioning/suballocation without coupling `AgentManager` to filesystem or git operations.
## Resource Provisioning
The provisioning layer separates:
- Hard constraints: actual resource allocation enforced before run.
- Soft constraints: injected env vars, prompt sections, metadata, and discovery snapshot.
Built-in providers:
- `git-worktree`
- `port-range`
Runtime injection includes:
- Working directory override
- Injected env vars such as `AGENT_WORKTREE_PATH`, `AGENT_PORT_RANGE_START`, `AGENT_PORT_RANGE_END`, `AGENT_PORT_PRIMARY`
- Discovery file path via `AGENT_DISCOVERY_FILE`
### Hierarchical Suballocation
Parent sessions can suballocate resources for child sessions using:
- `ResourceProvisioningOrchestrator.provisionChildSession(...)`
- `buildChildResourceRequests(...)`
Behavior:
- Child worktrees are placed under a deterministic parent-scoped root.
- Child port blocks are deterministically carved from the parent assigned range.
## MCP Configuration
Use `mcp.config.json` to configure shared and provider-specific MCP servers.
- `MCP_CONFIG_PATH` controls config location (default `./mcp.config.json`).
- Shared server definitions are in `servers`.
- Provider overrides:
- `codex.mcp_servers`
- `claude.mcpServers`
- Handlers:
- built-in `context7`
- built-in `claude-task-master`
- built-in `generic`
- custom handlers via `registerMcpHandler(...)`
See `mcp.config.example.json` for a complete template.
## Environment Variables
### Provider/Auth
- `CODEX_API_KEY`
- `OPENAI_API_KEY`
- `OPENAI_BASE_URL`
- `CODEX_SKIP_GIT_CHECK`
- `ANTHROPIC_API_KEY`
- `CLAUDE_MODEL`
- `CLAUDE_CODE_PATH`
- `MCP_CONFIG_PATH`
### Agent Manager Limits
- `AGENT_MAX_CONCURRENT`
- `AGENT_MAX_SESSION`
- `AGENT_MAX_RECURSIVE_DEPTH`
### Orchestration Limits
- `AGENT_STATE_ROOT`
- `AGENT_TOPOLOGY_MAX_DEPTH`
- `AGENT_TOPOLOGY_MAX_RETRIES`
- `AGENT_RELATIONSHIP_MAX_CHILDREN`
### Provisioning
- `AGENT_WORKTREE_ROOT`
- `AGENT_WORKTREE_BASE_REF`
- `AGENT_PORT_BASE`
- `AGENT_PORT_BLOCK_SIZE`
- `AGENT_PORT_BLOCK_COUNT`
- `AGENT_PORT_PRIMARY_OFFSET`
- `AGENT_PORT_LOCK_DIR`
- `AGENT_DISCOVERY_FILE_RELATIVE_PATH`
Defaults are documented in `.env.example`.
## Quality Gate
Run the full pre-PR gate:
```bash
npm run verify
```
Equivalent individual commands:
```bash
npm run check
npm run check:tests
npm run test
npm run build
```
## Build and Start
```bash
npm run build
npm run start -- codex "Hello from built JS"
```
## Known Limitations
- Tool clearance allowlist/banlist is currently metadata; enforcement is not yet wired into an execution sandbox.
## References
- `docs/orchestration-engine.md`
- OpenAI Codex SDK docs: https://developers.openai.com/codex/sdk/
- Codex MCP config docs: https://developers.openai.com/codex/config#model-context-protocol-mcp_servers
- Claude Agent SDK docs: https://platform.claude.com/docs/en/agent-sdk/overview