first commit
This commit is contained in:
322
README.md
Normal file
322
README.md
Normal file
@@ -0,0 +1,322 @@
|
||||
# AI Ops: Schema-Driven Multi-Agent Orchestration Runtime
|
||||
|
||||
TypeScript runtime for deterministic multi-agent execution with:
|
||||
|
||||
- OpenAI Codex SDK integration (`@openai/codex-sdk`)
|
||||
- Anthropic Claude Agent SDK integration (`@anthropic-ai/claude-agent-sdk`)
|
||||
- Schema-validated orchestration (`AgentManifest`)
|
||||
- Stateless node handoffs via persisted state/context payloads
|
||||
- Resource provisioning (git worktrees + deterministic port ranges)
|
||||
- MCP configuration layer with handler-based policy hooks
|
||||
|
||||
## Current Status
|
||||
|
||||
- Provider entrypoints (`codex`, `claude`) run with session limits and resource provisioning.
|
||||
- Schema-driven orchestration is implemented as reusable modules under `src/agents`.
|
||||
- Recursive `AgentManager.runRecursiveAgent(...)` supports fanout/fan-in orchestration with abort propagation.
|
||||
- Tool clearance allowlist/banlist is modeled, but hard security enforcement is still a TODO at tool execution boundaries.
|
||||
|
||||
## Repository Layout
|
||||
|
||||
- `src/agents`:
|
||||
- `manager.ts`: queue-based concurrency limits + recursive fanout/fan-in orchestration.
|
||||
- `runtime.ts`: env-driven runtime singletons and defaults.
|
||||
- `manifest.ts`: `AgentManifest` schema parsing + validation (strict DAG).
|
||||
- `persona-registry.ts`: prompt templating + persona behavior events.
|
||||
- `pipeline.ts`: actor-oriented DAG runner with retries and state-dependent routing.
|
||||
- `state-context.ts`: persisted state + stateless handoff reconstruction.
|
||||
- `provisioning.ts`: extensible resource orchestration + child suballocation support.
|
||||
- `orchestration.ts`: `SchemaDrivenExecutionEngine` facade.
|
||||
- `src/mcp`: MCP config types, conversions, and handler resolution.
|
||||
- `src/examples`: provider entrypoints (`codex.ts`, `claude.ts`).
|
||||
- `tests`: unit coverage for manager, manifest, pipeline/orchestration, state context, MCP, and provisioning behavior.
|
||||
- `docs/orchestration-engine.md`: design notes for the orchestration architecture.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Node.js 18+
|
||||
- npm
|
||||
|
||||
## Setup
|
||||
|
||||
```bash
|
||||
npm install
|
||||
cp .env.example .env
|
||||
cp mcp.config.example.json mcp.config.json
|
||||
```
|
||||
|
||||
Fill in any values you need in `.env`.
|
||||
|
||||
## Run
|
||||
|
||||
Run Codex example:
|
||||
|
||||
```bash
|
||||
npm run codex -- "Summarize what this repository does."
|
||||
```
|
||||
|
||||
Run Claude example:
|
||||
|
||||
```bash
|
||||
npm run claude -- "Summarize what this repository does."
|
||||
```
|
||||
|
||||
Run via unified entrypoint:
|
||||
|
||||
```bash
|
||||
npm run dev -- codex "List potential improvements."
|
||||
npm run dev -- claude "List potential improvements."
|
||||
```
|
||||
|
||||
## Schema-Driven Orchestration
|
||||
|
||||
The orchestration engine is exposed as library modules (not yet wired into `src/index.ts` by default).
|
||||
|
||||
Core pieces:
|
||||
|
||||
- `parseAgentManifest(...)` validates the full orchestration schema.
|
||||
- `PersonaRegistry` injects runtime context into templated system prompts.
|
||||
- `PipelineExecutor` executes a strict DAG of actor nodes.
|
||||
- `FileSystemStateContextManager` enforces stateless handoffs.
|
||||
- `SchemaDrivenExecutionEngine` composes all of the above with env-driven limits.
|
||||
|
||||
### AgentManifest Overview
|
||||
|
||||
`AgentManifest` (schema version `"1"`) includes:
|
||||
|
||||
- `topologies`: any of `hierarchical`, `retry-unrolled`, `sequential`
|
||||
- `personas`: identity, prompt template, tool clearance metadata
|
||||
- `relationships`: parent-child persona edges and constraints
|
||||
- `pipeline`: strict DAG with entry node, nodes, and edges
|
||||
- `topologyConstraints`: max depth and retry ceilings
|
||||
|
||||
Edge routing supports:
|
||||
|
||||
- Event gates: `success`, `validation_fail`, `failure`, `always`, `onTaskComplete`, `onValidationFail`
|
||||
- Conditions:
|
||||
- `state_flag`
|
||||
- `history_has_event`
|
||||
- `file_exists`
|
||||
- `always`
|
||||
|
||||
Example manifest:
|
||||
|
||||
```json
|
||||
{
|
||||
"schemaVersion": "1",
|
||||
"topologies": ["hierarchical", "retry-unrolled", "sequential"],
|
||||
"personas": [
|
||||
{
|
||||
"id": "coder",
|
||||
"displayName": "Coder",
|
||||
"systemPromptTemplate": "Implement ticket {{ticket}} in repo {{repo}}",
|
||||
"toolClearance": {
|
||||
"allowlist": ["read_file", "write_file"],
|
||||
"banlist": ["rm"]
|
||||
}
|
||||
}
|
||||
],
|
||||
"relationships": [],
|
||||
"pipeline": {
|
||||
"entryNodeId": "coder-1",
|
||||
"nodes": [
|
||||
{
|
||||
"id": "coder-1",
|
||||
"actorId": "coder_actor",
|
||||
"personaId": "coder",
|
||||
"constraints": { "maxRetries": 1 }
|
||||
}
|
||||
],
|
||||
"edges": []
|
||||
},
|
||||
"topologyConstraints": {
|
||||
"maxDepth": 4,
|
||||
"maxRetries": 2
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Minimal Engine Usage
|
||||
|
||||
```ts
|
||||
import { SchemaDrivenExecutionEngine } from "./src/agents/orchestration.js";
|
||||
|
||||
const engine = new SchemaDrivenExecutionEngine({
|
||||
manifest,
|
||||
actorExecutors: {
|
||||
coder_actor: async ({ prompt, context, toolClearance }) => {
|
||||
// execute actor logic here
|
||||
return {
|
||||
status: "success",
|
||||
payload: {
|
||||
summary: "done"
|
||||
},
|
||||
stateFlags: {
|
||||
implemented: true
|
||||
}
|
||||
};
|
||||
}
|
||||
},
|
||||
settings: {
|
||||
workspaceRoot: process.cwd(),
|
||||
runtimeContext: {
|
||||
repo: "ai_ops",
|
||||
ticket: "AIOPS-123"
|
||||
}
|
||||
}
|
||||
});
|
||||
|
||||
const result = await engine.runSession({
|
||||
sessionId: "session-1",
|
||||
initialPayload: {
|
||||
task: "Implement feature"
|
||||
}
|
||||
});
|
||||
|
||||
console.log(result.records);
|
||||
```
|
||||
|
||||
## Stateless Handoffs and Context
|
||||
|
||||
The engine does not depend on conversational memory between nodes.
|
||||
|
||||
- Node inputs are written as handoff payloads to storage.
|
||||
- Each node execution reads a fresh context snapshot from disk.
|
||||
- Session state persists:
|
||||
- flags
|
||||
- metadata
|
||||
- history events
|
||||
|
||||
Default state root is controlled by `AGENT_STATE_ROOT`.
|
||||
|
||||
## Recursive Orchestration Contract
|
||||
|
||||
`AgentManager.runRecursiveAgent(...)` uses a strict two-phase fanout/fan-in model:
|
||||
|
||||
- Phase 1 (planner): agent execution returns either a terminal result or a fanout plan (`intents[]` + `aggregate(...)`).
|
||||
- Parent tokens are released before children are scheduled, avoiding deadlocks even when `AGENT_MAX_CONCURRENT=1`.
|
||||
- Children run in isolated deterministic session IDs (`<parent>_child_<index>`), each with their own `AbortSignal`.
|
||||
- Phase 2 (aggregator): once all children complete, the aggregate phase runs as a fresh invocation.
|
||||
|
||||
Optional child middleware hooks (`allocateForChild`, `releaseForChild`) let callers integrate provisioning/suballocation without coupling `AgentManager` to filesystem or git operations.
|
||||
|
||||
## Resource Provisioning
|
||||
|
||||
The provisioning layer separates:
|
||||
|
||||
- Hard constraints: actual resource allocation enforced before run.
|
||||
- Soft constraints: injected env vars, prompt sections, metadata, and discovery snapshot.
|
||||
|
||||
Built-in providers:
|
||||
|
||||
- `git-worktree`
|
||||
- `port-range`
|
||||
|
||||
Runtime injection includes:
|
||||
|
||||
- Working directory override
|
||||
- Injected env vars such as `AGENT_WORKTREE_PATH`, `AGENT_PORT_RANGE_START`, `AGENT_PORT_RANGE_END`, `AGENT_PORT_PRIMARY`
|
||||
- Discovery file path via `AGENT_DISCOVERY_FILE`
|
||||
|
||||
### Hierarchical Suballocation
|
||||
|
||||
Parent sessions can suballocate resources for child sessions using:
|
||||
|
||||
- `ResourceProvisioningOrchestrator.provisionChildSession(...)`
|
||||
- `buildChildResourceRequests(...)`
|
||||
|
||||
Behavior:
|
||||
|
||||
- Child worktrees are placed under a deterministic parent-scoped root.
|
||||
- Child port blocks are deterministically carved from the parent assigned range.
|
||||
|
||||
## MCP Configuration
|
||||
|
||||
Use `mcp.config.json` to configure shared and provider-specific MCP servers.
|
||||
|
||||
- `MCP_CONFIG_PATH` controls config location (default `./mcp.config.json`).
|
||||
- Shared server definitions are in `servers`.
|
||||
- Provider overrides:
|
||||
- `codex.mcp_servers`
|
||||
- `claude.mcpServers`
|
||||
- Handlers:
|
||||
- built-in `context7`
|
||||
- built-in `claude-task-master`
|
||||
- built-in `generic`
|
||||
- custom handlers via `registerMcpHandler(...)`
|
||||
|
||||
See `mcp.config.example.json` for a complete template.
|
||||
|
||||
## Environment Variables
|
||||
|
||||
### Provider/Auth
|
||||
|
||||
- `CODEX_API_KEY`
|
||||
- `OPENAI_API_KEY`
|
||||
- `OPENAI_BASE_URL`
|
||||
- `CODEX_SKIP_GIT_CHECK`
|
||||
- `ANTHROPIC_API_KEY`
|
||||
- `CLAUDE_MODEL`
|
||||
- `CLAUDE_CODE_PATH`
|
||||
- `MCP_CONFIG_PATH`
|
||||
|
||||
### Agent Manager Limits
|
||||
|
||||
- `AGENT_MAX_CONCURRENT`
|
||||
- `AGENT_MAX_SESSION`
|
||||
- `AGENT_MAX_RECURSIVE_DEPTH`
|
||||
|
||||
### Orchestration Limits
|
||||
|
||||
- `AGENT_STATE_ROOT`
|
||||
- `AGENT_TOPOLOGY_MAX_DEPTH`
|
||||
- `AGENT_TOPOLOGY_MAX_RETRIES`
|
||||
- `AGENT_RELATIONSHIP_MAX_CHILDREN`
|
||||
|
||||
### Provisioning
|
||||
|
||||
- `AGENT_WORKTREE_ROOT`
|
||||
- `AGENT_WORKTREE_BASE_REF`
|
||||
- `AGENT_PORT_BASE`
|
||||
- `AGENT_PORT_BLOCK_SIZE`
|
||||
- `AGENT_PORT_BLOCK_COUNT`
|
||||
- `AGENT_PORT_PRIMARY_OFFSET`
|
||||
- `AGENT_PORT_LOCK_DIR`
|
||||
- `AGENT_DISCOVERY_FILE_RELATIVE_PATH`
|
||||
|
||||
Defaults are documented in `.env.example`.
|
||||
|
||||
## Quality Gate
|
||||
|
||||
Run the full pre-PR gate:
|
||||
|
||||
```bash
|
||||
npm run verify
|
||||
```
|
||||
|
||||
Equivalent individual commands:
|
||||
|
||||
```bash
|
||||
npm run check
|
||||
npm run check:tests
|
||||
npm run test
|
||||
npm run build
|
||||
```
|
||||
|
||||
## Build and Start
|
||||
|
||||
```bash
|
||||
npm run build
|
||||
npm run start -- codex "Hello from built JS"
|
||||
```
|
||||
|
||||
## Known Limitations
|
||||
|
||||
- Tool clearance allowlist/banlist is currently metadata; enforcement is not yet wired into an execution sandbox.
|
||||
|
||||
## References
|
||||
|
||||
- `docs/orchestration-engine.md`
|
||||
- OpenAI Codex SDK docs: https://developers.openai.com/codex/sdk/
|
||||
- Codex MCP config docs: https://developers.openai.com/codex/config#model-context-protocol-mcp_servers
|
||||
- Claude Agent SDK docs: https://platform.claude.com/docs/en/agent-sdk/overview
|
||||
Reference in New Issue
Block a user