1 Commits

Author SHA1 Message Date
9b4ef8fed8 chore: add UI field hover help and update project notes 2026-02-23 20:57:51 -05:00
17 changed files with 636 additions and 533 deletions

View File

@@ -32,8 +32,6 @@ AGENT_RELATIONSHIP_MAX_CHILDREN=4
# Resource provisioning (hard + soft constraints)
AGENT_WORKTREE_ROOT=.ai_ops/worktrees
AGENT_WORKTREE_BASE_REF=HEAD
# Optional relative path inside each worktree; enables sparse-checkout and sets working directory there.
AGENT_WORKTREE_TARGET_PATH=
AGENT_PORT_BASE=36000
AGENT_PORT_BLOCK_SIZE=32
AGENT_PORT_BLOCK_COUNT=512

View File

@@ -32,7 +32,6 @@
- Provisioning/resource controls:
- `AGENT_WORKTREE_ROOT`
- `AGENT_WORKTREE_BASE_REF`
- `AGENT_WORKTREE_TARGET_PATH`
- `AGENT_PORT_BASE`
- `AGENT_PORT_BLOCK_SIZE`
- `AGENT_PORT_BLOCK_COUNT`

View File

@@ -25,12 +25,6 @@ TypeScript runtime for deterministic multi-agent execution with:
- `artifactPointers`
- `taskQueue`
## Deep Dives
- Session walkthrough with concrete artifacts from a successful provider run: `docs/session-walkthrough.md`
- Orchestration engine internals: `docs/orchestration-engine.md`
- Runtime event model and sinks: `docs/runtime-events.md`
## Repository Layout
- `src/agents`
@@ -97,13 +91,13 @@ The UI provides:
- provider selector: `codex` or `claude`
- run history from `AGENT_STATE_ROOT`
- forms for runtime Discord webhook settings, security policy, and manager/resource limits
- hover help on form labels with short intent guidance for each field
- manifest editor/validator/saver for schema `"1"` manifests
Provider mode notes:
- `provider=codex` uses existing OpenAI/Codex auth settings (`OPENAI_AUTH_MODE`, `CODEX_API_KEY`, `OPENAI_API_KEY`).
- `provider=claude` uses Claude auth resolution (`CLAUDE_CODE_OAUTH_TOKEN` preferred, otherwise `ANTHROPIC_API_KEY`, or existing Claude Code login state).
- `CLAUDE_MODEL` should be a Claude model id/alias recognized by Claude Code (for example `claude-sonnet-4-6`); `anthropic/...` prefixes are normalized automatically.
## Manifest Semantics
@@ -274,7 +268,6 @@ jq -c 'select(.severity=="critical")' .ai_ops/events/runtime-events.ndjson
- `AGENT_WORKTREE_ROOT`
- `AGENT_WORKTREE_BASE_REF`
- `AGENT_WORKTREE_TARGET_PATH` (optional relative path; enables sparse checkout and sets session working directory to that subfolder)
- `AGENT_PORT_BASE`
- `AGENT_PORT_BLOCK_SIZE`
- `AGENT_PORT_BLOCK_COUNT`

View File

@@ -13,7 +13,6 @@ The orchestration runtime introduces explicit schema validation and deterministi
- Project context store (`src/agents/project-context.ts`): project-scoped global flags, artifact pointers, and task queue persisted across sessions.
- Orchestration facade (`src/agents/orchestration.ts`): wires manifest + registry + pipeline + state manager + project context with env-driven limits.
- Hierarchical resource suballocation (`src/agents/provisioning.ts`): builds child `git-worktree` and child `port-range` requests from parent allocation data.
- Optional `AGENT_WORKTREE_TARGET_PATH` enables sparse-checkout for a subdirectory and sets per-session working directory to that target path.
- Recursive manager runtime (`src/agents/manager.ts`): utility invoked by the pipeline engine for fan-out/retry-unrolled execution.
## Constraint model

View File

@@ -1,160 +0,0 @@
# Session Walkthrough (Concrete Example)
This document walks through one successful provider run end-to-end using:
- session id: `ui-session-mlzw94bv-cb753677`
- run id: `9287775f-a507-492a-9afa-347ed3f3a6b3`
- execution mode: `provider`
- provider: `claude`
- manifest: `.ai_ops/manifests/test.json`
Use this as a mental model and as a debugging template for future sessions.
## 1) What happened in this run
The manifest defines two sequential nodes:
1. `write-node` (persona: writer)
2. `copy-node` (persona: copy-editor)
Edge routing is `write-node -> copy-node` on `success`.
In this run:
1. `write-node` succeeded on attempt 1 and emitted `validation_passed` and `tasks_planned`.
2. `copy-node` succeeded on attempt 1 and emitted `validation_passed`.
3. Session aggregate status was `success`.
## 2) Timeline from runtime events
From `.ai_ops/events/runtime-events.ndjson`:
1. `2026-02-24T00:55:28.632Z` `session.started`
2. `2026-02-24T00:55:48.705Z` `node.attempt.completed` for `write-node` with `status=success`
3. `2026-02-24T00:55:48.706Z` `domain.validation_passed` for `write-node`
4. `2026-02-24T00:55:48.706Z` `domain.tasks_planned` for `write-node`
5. `2026-02-24T00:56:14.237Z` `node.attempt.completed` for `copy-node` with `status=success`
6. `2026-02-24T00:56:14.238Z` `domain.validation_passed` for `copy-node`
7. `2026-02-24T00:56:14.242Z` `session.completed` with `status=success`
## 3) How artifacts map to runtime behavior
### Run metadata (UI-level)
`state/<session>/ui-run-meta.json` stores run summary fields:
- run/provider/mode
- status (`running`, `success`, `failure`, `cancelled`)
- start/end timestamps
For this run:
```json
{
"sessionId": "ui-session-mlzw94bv-cb753677",
"status": "success",
"executionMode": "provider",
"provider": "claude"
}
```
### Handoffs (node input payloads)
`state/<session>/handoffs/*.json` stores payload handoffs per node.
`write-node.json`:
```json
{
"nodeId": "write-node",
"payload": { "prompt": "be yourself" }
}
```
`copy-node.json` includes `fromNodeId: "write-node"` and carries the story generated by the writer node.
Important: this is the payload pipeline edge transfer. If a downstream node output looks strange, inspect this file first.
### Session state (flags + metadata + history)
`state/<session>/state.json` is cumulative session state:
- `flags`: merged boolean flags from node results
- `metadata`: merged metadata from node results/behavior patches
- `history`: domain-event history entries
For this run, state includes:
- flags: `story_written=true`, `copy_edited=true`
- history events:
- `write-node: validation_passed`
- `write-node: tasks_planned`
- `copy-node: validation_passed`
### Project context pointer
`.ai_ops/project-context.json` tracks cross-session pointers like:
- `sessions/<session>/last_completed_node`
- `sessions/<session>/last_attempt`
- `sessions/<session>/final_state`
This lets operators and tooling locate the final state file for any completed session.
## 4) Code path (from button click to persisted state)
1. UI starts run via `UiRunService.startRun(...)`.
2. Service loads config, parses manifest, creates engine, writes initial run meta.
3. Engine `runSession(...)` initializes state and writes entry handoff.
4. Pipeline executes ready nodes:
- builds fresh node context (`handoff + state`)
- renders persona system prompt
- invokes provider executor
- receives actor result
5. Lifecycle observer persists:
- state flags/metadata/history
- runtime events (`node.attempt.completed`, `domain.*`)
- project context pointers (`last_completed_node`, `last_attempt`)
6. Pipeline evaluates edges and writes downstream handoffs.
7. Pipeline computes aggregate status and emits `session.completed`.
8. UI run service writes final `ui-run-meta.json` status from pipeline summary.
Primary entrypoints:
- `src/ui/run-service.ts`
- `src/agents/orchestration.ts`
- `src/agents/pipeline.ts`
- `src/agents/lifecycle-observer.ts`
- `src/agents/state-context.ts`
- `src/ui/provider-executor.ts`
## 5) Mental model that keeps this manageable
Think of one session as five stores and one loop:
1. Manifest (static plan): node graph + routing rules.
2. Handoffs (per-node input payload snapshots).
3. State (session memory): flags + metadata + domain history.
4. Runtime events (timeline/audit side channel).
5. Project context (cross-session pointers and shared context).
6. Loop: dequeue ready node -> execute -> persist result/events -> enqueue next nodes.
If you track those six things, behavior becomes deterministic and explainable.
## 6) Debug checklist for any future session id
Given `<sid>`, inspect in this order:
1. `state/<sid>/ui-run-meta.json`
2. `.ai_ops/events/runtime-events.ndjson` filtered by `<sid>`
3. `state/<sid>/handoffs/*.json`
4. `state/<sid>/state.json`
5. `.ai_ops/project-context.json` pointer entries for `<sid>`
Interpretation:
1. No `session.started`: run failed before pipeline began.
2. `node.attempt.completed` with `failureCode=provider_*`: provider/runtime issue.
3. Missing downstream handoff file: edge condition did not pass.
4. `history` has `validation_failed`: retry/unrolled path or remediation branch likely triggered.
5. `ui-run-meta` disagrees with runtime events: check run-service status mapping and restart server on new code.

View File

@@ -5,8 +5,147 @@
- can use any openai/anthropic models
- can use multiple sets of creds
# in progress
# Scheduled
# other scheduled
- persona definitions
- product
- task
- coder
- tester
- git
- handle basic git validation/maintenance
- edit + merge when conflict is low
- pass to dev when conflict is big
- task management flow outline
- what is hard coded?
- anything that isnt 100% reliant on an llm
- complete task, next task, etc
- task dependency graph aka the next task to be assigned is x
- giga do not ever let the agent call something like this, ban it if you can
- task assignment
- task init
-
- what is sent to llm?
- the minimum possible relevant data
- task prioritization
- subtask explosion
- "clarification needed" process
- init
- planning
- prioritization
- dependency graph
- subtasks
- task/subtask status updates (pending, in progress, done, failed)
- remove todoist mcp
# Considering
- adding a similar testing methodology to the python script with playwright and visual automation + banning them from hacking it with curl and whatnot
- add instruction to tell the agent under which circumstances it should consider using context7 and if it decides to use it how it should write code to interact with it rather than calling it directly
- pretty colors on terminal uwu
- agent names
- consider adding google gemini 3.1, even though it costs money it is the best prd drafter by far. might be good at tasks too
- list/select models
- selection per task/session/agent
- git orchestration
- merging
- symlinks
# consider adding these libs
1. The Control Flow: Keep Yours, But Upgrade the Math
Right now, your PipelineExecutor (Feedback #5) is handling scheduling and topological fan-out manually. This is where homegrown DAGs usually start breaking down as relationships get complex.
What the big frameworks use: They rely on established graph theory libraries to handle the execution order.
What you should adopt: Do not write your own DAG traversal logic. Bring in a lightweight library like graphlib (or a modern TypeScript equivalent) to handle the topological sorting.
2. . Tooling and Transport: The MCP SDK
You already have a src/mcp directory, which puts you ahead of the curve. But managing the low-level JSON-RPC protocol over stdio or Server-Sent Events (SSE) is notoriously fragile.
What the big frameworks use: The official @modelcontextprotocol/sdk packages provided by Anthropic.
What you should adopt: If you aren't already, replace your custom src/mcp/converters.ts logic with the official SDK. Relying on the official standard ensures your orchestrator isn't permanently hard-coupled to your current Anthropic and OpenAI subscriptions. If you decide to point this engine at your local Ollama instance running behind Traefik, a standardized MCP transport layer guarantees your tools and context will work seamlessly across both your cloud models and your local open-weight ones.
3. Process Execution: Safer Shells
When your agents execute shell commands in that AGENT_WORKTREE_ROOT, using Node's raw child_process.exec is messy. It buffers stdout/stderr poorly and makes escaping arguments dangerous.
What the big frameworks use: zx (by Google) or execa.
What you should adopt: execa is fantastic for this. It handles process timeouts, cleans up orphaned child processes automatically (crucial for your retry-unrolled DAGs), and streams stdout natively so you can pipe it directly into your domain-events bus without memory bloat.
# Completed
1. boilerplate typescript project for claude
- mcp server support
- generic mcp handlers
- specific mcp handlers for
- context7
- claude task manager
- concurrency, configurable max agent and max depth
- Extensible Resource Provisioning
- hard constraints
- soft constraints
- basic hygeine run
# epic
- agent orchestration system improvements
# module 1
- schema driven execution engine
- specific definitions handled in AgentManifest schema
- persona registry
- templated system prompts injected with runtime context
- tool clearances (stub this for now, add TODO for security implementation)
- allowlist
- banlist
- behavioral event handlers
- define how personas react to specific events ie. onTaskComplete, onValidationFail
# module 2
- actor oriented pipeline constrained by a strict directed acyclic graph
- relationship + pipeline graphs
- multi level topology
- hierarchical ie parent spawns 3 coder children
- unrolled retry pipelines ie coder1 > QA1 > Coder2 > QA2
- sequential ie product > task > coder > QA > git
- support for constraint definition for each concept (relationship, pipeline, topology)
- ie max depth, max retries
- state dependent routings
- support branching logic based on project history or repository state ie. project init requires product agent to generate prd, then task agent needs to create roadmap, once those exist future sessions skip those agents and go straight to coder agents
# module 3
- state/context manager
- stateless handoffs
- state and context are passed forwards through payloads via worktree/storage, not conversational memory
- fresh context per node execution
# module 4
- resource provisioning
- hierarchical resource suballocation
- when a parent agent spawns children, handle local resource management
- branche/sub-worktree provisioning
- suballocating deterministic port range provisioning
- extensibility to support future resource types
# epic
implementation of AgentManager.runRecursiveAgent
@@ -68,109 +207,352 @@ implementation of AgentManager.runRecursiveAgent
- The Abort Test: Start a parent with a 5-second sleep task, cancel the session at 1 second. Assert that the underlying LLM SDK handles were aborted and resources were released.
- The Isolation Test: Spawn two children concurrently. Assert they are assigned non-overlapping port ranges and isolated worktree paths.
# Scheduled
- security implementation
- persona definitions
- product
- task
- coder
- tester
- git
- handle basic git validation/maintenance
- edit + merge when conflict is low
- pass to dev when conflict is big
# epic
- need to untangle
- what goes where in terms of DAG definition vs app logic vs agent behavior
- events
- what events do we have
- what personas care about what events
- how should a persona respond to an event
- where is this defined
- success/failure/retry policy definitions
- where does this go?
- what are they?
# connecting pipeline engine and recursive agent management
- The Pipeline Engine must be the single source of truth. The Recursive Manager should not be a separate way to run agents; it should be a Utility that the Pipeline Engine calls when it hits a node that requires a "Fan-out" (hierarchical or unrolled-retry topology).
- deprecate the standalone CLI examples for runRecursiveAgent and instead wire the Manager directly inside SchemaDrivenExecutionEngine.runSession
# execution driven topologies
- currently, manifest validates that a topology is, for example, "hierarchical." But at runtime, the code just ignores that label and runs everything sequentially
- The execution loop needs a true DAG Runner
- When the Orchestrator evaluates the next nodes to run, if it sees a "hierarchical" or "parallel" topology block, it must dispatch those nodes to the AgentManager concurrently using Promise.all(), rather than waiting for one to finish before starting the next
# project scoped data store
- implement a ProjectContext store. Sessions should read from the global Project State on initialization, and write their metadata updates back to the Project State upon successful termination
- store should contain these domains
- global flags
- artifact pointers
- task queue
- dag orchestrator reads file at init and writes to it upon node completion
# typed domain event bus
- Implement a strongly-typed Event Bus or a Domain Event schema.
- Create a standard payload shape for events
- The Pipeline should allow edges to trigger based on specific domain events, not just basic success/fail strings
- planning events - These events occur when the project state is empty or a new major feature is requested. They transition the system from "idea" to "actionable work."
- requirements defined
- product agent triggers upon prd completion > task agent consumes
- tasks planned
- task agent triggers upon completion of dedicated claude-task-manager process > coder agent consumes
- execution events - These are the most common events. They handle the messy reality of writing code and the cyclical (but unrolled) retry pipelines.
- code committed
- task blocked (needs clarification, impossible task, max retry etc)
- validation events - These events dictate whether the DAG moves forward to integration or branches sideways into a retry pipeline.
- validation passed
- validation failed
- integration events - This event closes the loop and updates the global state.
- branch merged (also tasks updated etc)
# retry matrix and cancellation
- Implement a Status Retry Matrix and enforce AbortSignal everywhere
- Validation_Fail: Trigger the unrolled retry pipeline (send the error back to a new agent instance)
- Hard_Failure (>=2 sequential API timeouts, network drops, 403, etc): Fail fast, do not burn tokens retrying. Bubble the error up to the user
- Pass standard AbortSignal objects down into the ActorExecutionInput so the pipeline can instantly kill rogue processes.
# code review epic
- Header alias inconsistency can break Claude MCP auth/config
- Normalize the config object immediately upon parsing in src/mcp/converters.ts, mapping both headers and http_headers to a single internal representation before either the Codex or Claude handlers touch them
- Update src/agents/pipeline.ts to compute an aggregate status. You should traverse the execution records and ensure all terminal nodes (leaves) in your DAG have a status of "success". If any node in the critical path fails, the whole session should be marked as a failure
- file persistence is not atomic, project-context serialization is process-local only
- Implement atomic writes
- Direct writes in state/context: src/agents/state-context.ts:203, src/agents/state-context.ts:251, src/agents/project-
context.ts:171, src/agents/project-context.ts:205.
- Queue in FileSystemProjectContextStore (src/agents/project-context.ts:145) protects only within one process.
- pipeline executor owns too many responsibilities
- Start extracting distinct policies.
- Move failure classification (hard vs soft fails) into a dedicated FailurePolicy class.
- Move persistence and event emissions into a LifecycleObserver or event bus listener rather than keeping them hardcoded in the execution loop
- Global mutable MCP handler registry limits extensibility/test isolation
- Refactor the registry into an instantiable class (e.g., McpRegistry)
- Pass this instance into your SchemaDrivenExecutionEngine and PipelineExecutor via dependency injection instead of relying on auto-installing imports
- Provider example entrypoints duplicate orchestration pattern
- Create a unified helper like createSessionContext(provider, config) that handles the provisioning, probing, and prompting loop, keeping the provider-specific code strictly limited to model initialization
- Config/env parsing is duplicated
- Create a single src/config.ts (or dedicated config service) that parses process.env, validates it, applies defaults, and freezes the object.
- Inject this single source of truth throughout the app
- Project context parsing is strict
- Update src/agents/project-context.ts:106 to merge parsed files with a set of default root keys
- Add a schemaVersion field to the JSON structure to allow for safe migrations later
# security middleware
- rely on an established AST (Abstract Syntax Tree) parser for shell scripts like bash-parser to handle tokenization
- Use an off-the-shelf parser to break commands down into executable binaries, flags, arguments, and environment variable assignments. We can scrub or inject specific environment variables securely at this layer
- focus specifically on extracting Command and Word nodes from the bash-parser output
- gives us a head start on exactly what part of the syntax tree matters for the allowlist
- AI agents frequently chain commands (&&, ||, |, >) to save turns. If your parser struggles with complex pipelines or subshells, it will artificially cripple the agents' ability to work efficiently
- rules engine
- For the simplest iteration, defining your allowlists and tool clearance schema via strictly typed Zod schemas is the most lightweight approach. You validate the AST output against the schema before passing it to the execution layer
- Implement strict binary allowlists (e.g., git, npm, node, cat) and enforce directory-bound execution (ensuring the cwd stays within AGENT_WORKTREE_ROOT)
- block path traversal attempts (e.g., ../). Even if the cwd starts in the worktree, an agent might try to read or write outside of it using relative paths in its arguments
- method for logging and profiling exactly what commands Codex and Claude are currently emitting to build a baseline allowlist for longer term best practices
- make clear todos around the need to replace/improve this
- sandbox/execution layer
- Execute commands using Node's child_process with explicitly dropped privileges (running as a non-root user via uid/gid), enforce timeouts, and stream stdout/stderr to your existing event bus for auditing.
- By default, Node child processes inherit the parent's environment variables. ensure that our env management policy is consistent and secure given this behavior
- A very modern pattern is to use your Node orchestrator to spawn a deno run child process. You can pass explicit flags like --allow-read=/target/worktree and --allow-run=git,npm. If the LLM tries to read an env file outside that directory, the Deno runtime instantly kills the process at the OS level.
- agents need to modify files in AGENT_WORKTREE_ROOT, but they must absolutely not have write access to AGENT_STATE_ROOT or AGENT_PROJECT_CONTEXT_PATH. The security middleware must strictly enforce this boundary.
- Your PipelineExecutor currently routes validation_fail into a retry-unrolled execution. You will need to define a new error class (e.g., SecurityViolationError). Should a security violation trigger a retry (telling the LLM "You can't do that, try another way"), or should it instantly hard-abort the pipeline?
- Your MCP tools currently have auto-installed builtins. The rules engine needs to apply not just to shell commands, but also to MCP tool calls. The schema for tool clearance (currently a TODO at src/agents/persona-registry.ts:79) needs to be unified with this new rules engine
- schema differences between claude and codex - no clue if we are doing anything for this
- MCP config boundary only verifies that the config is an "object" before casting it, risking late-stage crashes. Furthermore, the shared MCP type is missing the sdk (in-process) transport type supported by Claude
- Create a strict Zod schema for MCP configuration
- Define McpConfigSchema using Zod to strictly validate field shapes, ranges, and enums before handoff. Update src/mcp/types.ts to include sdk alongside stdio, http, and sse in your shared transport union
- Provider-specific MCP fields like enabled_tools and timeouts are used by Codex but silently dropped during conversion for Claude. This violates user expectations
- Fail fast or warn loudly: If the provider is set to Claude and these asymmetric fields are present in the parsed config, emit a clear warning log (e.g., [WARN] MCP field 'timeouts' is not supported by the Claude adapter and will be ignored)
- The SDK adapters are under-tested. The test suite covers converters and registries but misses the actual execution wiring, stream handling, and result parsing for Codex and Claude
- Implement integration/unit tests for the adapter boundaries
- add support CLAUDE_CODE_OAUTH_TOKEN instead of api key
- ensure your configuration schema can accept the new OAuth token. To keep it backward-compatible with standard API keys (in case you ever need to switch back), you can check for the OAuth token first, then fall back to the standard API key.
- runClaudePrompt drops the parsed config and lets the SDK auto-discover process.env.ANTHROPIC_API_KEY.
- task management flow
- init
- planning
- prioritization
- dependency graph
- subtasks
- task/subtask status updates (pending, in progress, done, failed)
- You need to explicitly pass the anthropicToken from your configuration into the underlying Anthropic client constructor. Depending on how you are instantiating the Agent SDK, you will pass it via the authToken or apiKey property
- found evidence of this drift in src/agents/provisioning.ts#L94. You will need to apply the exact same explicit wiring pattern there. Any time you instantiate the Anthropic client or the Claude Agent in a worker node, pass { apiKey: config.provider.anthropicToken }.
# Considering
- model selection per task/session/agent
- agent "notebook"
- agent run log
- agent persona support
- ping pong support - ie. product agent > dev agent, dev agent needs clarification = ping pong back to product. same with tester > dev.
- resume session aspect of this
- max ping pong length ie. tester can only pass back once otherwise mark as failed
- max ping pong length per relationship ie dev:git can ping pong 4 times, dev:product only once, etc
- git orchestration
- merging
- symlinks
- security
- whatever existing thing has
- banned commands (look up a git repo for this)
- front end
- list available models
- specific workflows
- ui
- ci/cd
- review
- testing
# Defer
# Won't Do
- rip out legacy/deprecated interfaces ie legacy status triggers, deprecated subagent method, etc
- recursive agent deprecation thing
# Completed
1. boilerplate typescript project for claude
- mcp server support
- generic mcp handlers
- specific mcp handlers for
- context7
- claude task manager
- concurrency, configurable max agent and max depth
- Extensible Resource Provisioning
- hard constraints
- soft constraints
- basic hygeine run
# epic
- agent orchestration system improvements
# module 1
- schema driven execution engine
- specific definitions handled in AgentManifest schema
- persona registry
- templated system prompts injected with runtime context
- tool clearances (stub this for now, add TODO for security implementation)
- allowlist
- banlist
- behavioral event handlers
- define how personas react to specific events ie. onTaskComplete, onValidationFail
# module 2
- actor oriented pipeline constrained by a strict directed acyclic graph
- relationship + pipeline graphs
- multi level topology
- hierarchical ie parent spawns 3 coder children
- unrolled retry pipelines ie coder1 > QA1 > Coder2 > QA2
- sequential ie product > task > coder > QA > git
- support for constraint definition for each concept (relationship, pipeline, topology)
- ie max depth, max retries
- state dependent routings
- support branching logic based on project history or repository state ie. project init requires product agent to generate prd, then task agent needs to create roadmap, once those exist future sessions skip those agents and go straight to coder agents
# module 3
- state/context manager
- stateless handoffs
- state and context are passed forwards through payloads via worktree/storage, not conversational memory
- fresh context per node execution
# module 4
- resource provisioning
- hierarchical resource suballocation
- when a parent agent spawns children, handle local resource management
- branche/sub-worktree provisioning
- suballocating deterministic port range provisioning
- extensibility to support future resource types
legacy/deprecated interfaces
# Phase 1: Safe & Focused Cleanups (Low/Medium Risk)
These can be bundled into a single PR or tackled as quick, independent tasks. They have minimal blast radius and clear mitigation paths.
Legacy status history duplication: * Action: Remove the historyEvent singular path in favor of the domain-event history. Migrate conditions from validation_fail to validation_failed.
Impact: Requires updating orchestration tests (tests/orchestration-engine.test.ts) and history semantics documentation.
Remove internal Claude token (anthropicToken):
Action: Simplify the resolver in src/config.ts to oauth/api only and drop the property.
Impact: Update config tests. Low risk, but constitutes an API shape change.
Remove MCP legacy header alias (http_headers):
Action: Drop from shared schema/type (src/mcp/types.ts) and converter merge logic (src/mcp/converters.ts).
Impact: Medium risk due to external config compatibility. Crucial: You must add a migration note for users utilizing external MCP configs.
Remove legacy edge trigger aliases (Alias-only):
Action: Remove the onTaskComplete and onValidationFail aliases only.
Impact: Safe to do now, as it shrinks the legacy surface area without breaking the core edge.on functionality currently heavily relied upon in tests.
# Phase 2: Staged Removals (Requires Care)
This item is deeply integrated and needs a multi-step replacement strategy rather than a direct deletion.
Deprecate runRecursiveAgent API:
Action: First, update the Pipeline (src/agents/pipeline.ts) to use the new private replacement call path for recursive execution. Only after the pipeline is successfully rerouted and the manager tests are updated should you remove the public deprecated wrapper.
README Impact: You will need to remove or update the note in the "Notes" section of your README that currently advertises AgentManager.runRecursiveAgent(...) for low-level testing.
# Phase 3 legacy/deprecated interfaces BIG AND SCARY
Switching to un-ts/sh-syntax is exactly the right move. It is a WebAssembly (WASM) wrapper around Go's highly respected mvdan/sh parser. It provides rigorous POSIX/Bash compliance and, crucially, ships with strict, native TypeScript definitions for its entire AST.
Here is the updated implementation guide tailored specifically to integrating un-ts/sh-syntax into your security middleware.
Phase 1: Dependency Migration
Remove the Legacy Code:
npm uninstall bash-parser
rm src/types/bash-parser.d.ts
Install the Replacement:
npm install sh-syntax
Phase 2: Rewrite the AST Adapter (src/security/shell-parser.ts)
The most significant architectural shift here is that sh-syntax is WASM-backed, making the parsing operation asynchronous. Your adapter and the calling security middleware must be updated to handle Promises.
You will also map your traversal logic to the mvdan/sh AST structures (e.g., File, Stmt, CallExpr, Word).
TypeScript
import { parse } from 'sh-syntax';
// Note: Depending on your runtime environment, you may need to configure the WASM loader
// via `import { getProcessor } from 'sh-syntax'` if standard Node resolution isn't sufficient.
export interface CommandTarget {
binary: string;
args: string[];
}
export async function extractExecutionTargets(shellInput: string): Promise<CommandTarget[]> {
const targets: CommandTarget[] = [];
// sh-syntax parsing is async due to WASM initialization
const ast = await parse(shellInput);
// Walk the AST. sh-syntax types closely mirror mvdan/sh Go types.
// The root is typically a 'File' containing a list of 'Stmt' (Statements).
if (!ast || !ast.StmtList || !ast.StmtList.Stmts) return targets;
for (const stmt of ast.StmtList.Stmts) {
const cmd = stmt.Cmd;
// Check if the command is a standard function/binary call
if (cmd && cmd.type === 'CallExpr') {
const args = cmd.Args;
if (args && args.length > 0) {
// The first argument in a CallExpr is the binary name
// You must strictly check that the binary is a literal (Word) and not computed
const binaryWord = args[0];
const binaryName = extractLiteralWord(binaryWord);
if (!binaryName) {
throw new SecurityViolationError("Dynamic or computed binary names are blocked.");
}
targets.push({
binary: binaryName,
args: args.slice(1).map(extractLiteralWord).filter(Boolean) as string[]
});
}
}
// Important: Explicitly reject subshells or commands your engine doesn't support
if (cmd && cmd.type === 'Subshell') {
throw new SecurityViolationError("Subshell execution is not permitted by security policy.");
}
// Analyze redirects to ensure they don't overwrite protected files (like state roots)
if (stmt.Redirs) {
enforceRedirectPolicy(stmt.Redirs);
}
}
return targets;
}
// Helper to safely extract string literals from Word nodes
function extractLiteralWord(wordNode: any): string | null {
// In sh-syntax, a Word contains Parts (Lit, SglQuoted, DblQuoted, ParamExp, etc.)
// You must enforce that the parts only consist of safe literals, rejecting ParamExp ($VAR).
// ...
}
Phase 3: Synchronize Orchestration & Middleware
Because extractExecutionTargets is now async:
Update SecureCommandExecutor: The constructor or initialization hook where you validate the command against your allowlists must await the parsing step.
Actor Execution Boundary: Ensure that wherever the LLM outputs a shell command during the DAG execution, the pipeline waits for the AST security validation before proceeding.
Phase 4: Strict Schema Alignment (src/security/schemas.ts)
Your Zod schemas do not need to validate the sh-syntax AST directly (since it is already strictly typed by the library). Instead, use Zod to validate the output array (CommandTarget[]) to ensure nothing slipped past the parser.
Update the execution schema: Ensure it enforces .strict() so that if extractExecutionTargets accidentally returns extraneous fields, the engine panics and fails closed.
Unified Allowlist validation: Ensure the extracted binary string is strictly validated against your AGENT_SECURITY_ALLOWED_BINARIES Zod array.
Phase 5: Revalidate Security Parity (tests/security-middleware.test.ts)
This is the final gate. The new AST structure handles complex bash semantics differently than the old untyped parser.
WASM Test Environment: Ensure your test runner (Jest, Vitest) is configured to load WebAssembly properly, or the parse function will throw an initialization error in CI.
Regression Threat Matrix:
echo $(unauthorized_bin) -> Must be caught and throw SecurityViolationError (Subshell).
allowed_bin && unauthorized_bin -> The StmtList must iterate over both commands and block the execution due to the second binary.
allowed_bin > /path/to/protected/file -> The Redirs property on the Stmt must trigger your boundary violation logic.
- review/update of readme, docs, and conf files where needed
- mvp for analytics + user notification logging
# giga model specific behavior and strict task agent control stuff
i am too dumb to understand it, but gemini 3.1 makes it sound like a really good idea
Architecture Brief: Deterministic Agent Execution & Policy Enforcement
Context & Goal
We are refactoring the execution layer to ensure low-level control over task agents (e.g., task_sync, task_plan_llm). The goal is to move away from open-ended, non-deterministic agent behavior and enforce a strict 4-layer control model where the LLM acts only as a bounded step within a hard-coded state machine.
The Problem Statement
We currently have a critical gap in our enforcement boundary that allows policy bypasses, compounded by provider-specific SDK quirks:
The Context Drop (The MCP Gap): The mcpRegistry (which defines our tool policies) is resolved globally at the orchestration layer, but it is not passed down into pipeline.ts or the ActorExecutor. As a result, the low-level execution nodes operate without awareness of the active tool clearance policies.
The Claude SDK Leakage: The Anthropic Claude SDK currently ignores the shared enabled_tools configuration in the MCP payload. If we rely solely on the shared MCP config, Claude can hallucinate and execute unauthorized tool calls.
The Anti-Pattern Risk: The initial proposal to fix this was to pass the entire mcpRegistry down into the ActorExecutor so it could self-regulate. This is a severe anti-pattern. It tightly couples our low-level execution sandbox with our high-level orchestration logic, forcing the "dumb" executor to parse topologies, phases, and complex registry configurations.
The Solution: The ResolvedExecutionContext Pattern
To close the enforcement gap without violating the Inversion of Control principle, we will implement a strict separation of concerns. The Orchestrator will handle the logic; the Executor will handle the enforcement.
Instead of passing down the full registry, the orchestration layer will pre-compute a flat, immutable policy payload for that specific node attempt and inject it into the executor.
Implementation Directives
Introduce ResolvedExecutionContext:
Create an interface that represents the fully resolved, un-negotiable constraints for a single execution step.
TypeScript
export interface ResolvedExecutionContext {
phase: string;
modelConstraint: string; // e.g., 'claude-3-haiku'
allowedTools: string[]; // Flat array of resolved tool names
security: {
dropUid: boolean;
worktreePath: string;
// ... other hard constraints
}
}
Update Orchestration (pipeline.ts):
Before invoking an actor, the pipeline must read the AgentManifest for the current node, cross-reference its toolClearance with the mcpRegistry, and generate the ResolvedExecutionContext.
Lock Down the Executor (executor.ts):
The ActorExecutor must accept this context and enforce it blindly:
Model Enforcement: Force the SDK initialization to strictly use context.modelConstraint.
Tool Enforcement (Claude Fix): Explicitly filter the tools passed into the provider SDK using context.allowedTools, physically preventing the Claude SDK from seeing tools outside its clearance.
Security Middleware: Pass context.allowedTools into the SecurityRulesEngine so any runtime attempt to bypass the SDK constraints results in an immediate AGENT_SECURITY_VIOLATION_MODE=hard_abort.
Expected Outcome
The execution nodes remain entirely decoupled from the orchestration state. The LLM cannot escalate its model tier or access unauthorized tools, and the provider SDK quirks are mitigated at the execution boundary.
# epic
# front end ui requirements
1. Graph Visualizer
Your initial thoughts on coloring by stage/agent and showing metadata (subtasks, tool calls, security violations) are spot-on. Because your backend relies heavily on DAG execution and a retry matrix, the visualizer will be the most critical piece of the UI.
What else is worth visualizing?
Based on your README, here are specific concepts you should expose on the graph:
Topology & Control Flow: Visually distinguish between sequential, parallel, hierarchical, and retry-unrolled branches. For example, a retry-unrolled node should visually indicate that it spawned a new child manager session to remediate a validation_fail.
Domain Event Edges: Since your pipeline edges route via typed events (requirements_defined, validation_failed), labeling the edges of the graph with the specific domain event that triggered the transition will make debugging orchestration loops much easier.
Economics & Performance (from Runtime Events): Your NDJSON events log tokenInput, tokenOutput, durationMs, and costUsd. Surfacing the "cost" or "time" of a specific DAG node directly on the graph helps identify inefficient prompts or agents.
the "Sandbox Payload": When a user clicks or hovers over a specific node (e.g., task_plan_llm), the UI must display the ResolvedExecutionContext payload that was injected into it
Critical Path & Abort Status: If a session fails due to two consecutive hard failures, visually highlighting the exact "critical path" that led to the AbortSignal cascading through the system will save hours of log-diving.
2. Notification / Webhook Interface
Your backend already has an elegant fan-out system (NDJSON analytics log + Discord webhook). The UI should act as a control panel and an in-app inbox for this.
Configuration: A form to manage AGENT_RUNTIME_DISCORD_WEBHOOK_URL, AGENT_RUNTIME_DISCORD_MIN_SEVERITY, and the ALWAYS_NOTIFY_TYPES CSV.
Live Event Feed: A real-time drawer or panel that tails the .ai_ops/events/runtime-events.ndjson file. You can parse the severity field to color-code the feed (e.g., flashing red for critical security mirror events like security.shell.command_blocked).
3. Job Trigger Interface
This is your execution entrypoint (SchemaDrivenExecutionEngine.runSession).
Inputs: A clean interface to provide the initial prompt/task, select the Manifest or Topology they want to run, and override global flags.
The "Kill Switch": Since every actor execution respects an AbortSignal, your UI needs a prominent, highly responsive "Cancel Run" button that immediately aborts child recursive work.
Run History: A table view summarizing aggregate session status from AGENT_STATE_ROOT, allowing users to click into past runs to view their graph state.
4. Definition Interface (Manifest, Config, Security)
You noted that anything secure stays on the backend. The frontend here should strictly be a client that reads/writes validated JSON or environment schemas.
Manifest Builder: A UI to visually build or edit the AgentManifest (Schema "1"), defining personas, tool-clearance policies, modelConstraint (or allowedModel), and setting maxDepth/maxRetries.
Security Policy Management: An interface mapped to src/security/schemas.ts. This allows admins to define AGENT_SECURITY_ALLOWED_BINARIES, toggle AGENT_SECURITY_VIOLATION_MODE (hard_abort vs validation_fail), and manage MCP tool allowlists/banlists.
Environment & Resource Limits: Simple forms to configure agent manager limits (AGENT_MAX_CONCURRENT) and port block sizing without manually editing the .env file.

View File

@@ -1,6 +1,6 @@
import { execFile } from "node:child_process";
import { createHash } from "node:crypto";
import { mkdir, open, stat, unlink, writeFile } from "node:fs/promises";
import { mkdir, open, unlink, writeFile } from "node:fs/promises";
import { dirname, isAbsolute, resolve } from "node:path";
import { promisify } from "node:util";
@@ -272,7 +272,6 @@ export class ResourceProvisioningOrchestrator {
export type GitWorktreeProviderConfig = {
rootDirectory: string;
baseRef: string;
targetPath?: string;
};
export type PortRangeProviderConfig = {
@@ -314,10 +313,6 @@ export function createGitWorktreeProvider(
provision: async ({ sessionId, workspaceRoot, options }) => {
const rootDirectory = readOptionalString(options, "rootDirectory", config.rootDirectory);
const baseRef = readOptionalString(options, "baseRef", config.baseRef);
const targetPath = normalizeWorktreeTargetPath(
readOptionalStringOrUndefined(options, "targetPath") ?? config.targetPath,
"targetPath",
);
const repoRoot = await runGit(["-C", workspaceRoot, "rev-parse", "--show-toplevel"]);
const worktreeRoot = resolvePath(repoRoot, rootDirectory);
@@ -326,18 +321,6 @@ export function createGitWorktreeProvider(
const worktreeName = buildScopedName(sessionId);
const worktreePath = resolve(worktreeRoot, worktreeName);
await runGit(["-C", repoRoot, "worktree", "add", "--detach", worktreePath, baseRef]);
if (targetPath) {
await runGit(["-C", worktreePath, "sparse-checkout", "init", "--cone"]);
await runGit(["-C", worktreePath, "sparse-checkout", "set", targetPath]);
}
const preferredWorkingDirectory = targetPath ? resolve(worktreePath, targetPath) : worktreePath;
await assertDirectoryExists(
preferredWorkingDirectory,
targetPath
? `Configured worktree target path "${targetPath}" is not a directory in ref "${baseRef}".`
: `Provisioned worktree path "${preferredWorkingDirectory}" does not exist.`,
);
return {
kind: "git-worktree",
@@ -346,7 +329,6 @@ export function createGitWorktreeProvider(
worktreeRoot,
worktreePath,
baseRef,
...(targetPath ? { targetPath } : {}),
},
soft: {
env: {
@@ -357,14 +339,12 @@ export function createGitWorktreeProvider(
promptSections: [
`Git worktree: ${worktreePath}`,
`Worktree base ref: ${baseRef}`,
...(targetPath ? [`Worktree target path: ${targetPath} (sparse-checkout enabled)`] : []),
],
metadata: {
git_worktree_path: worktreePath,
git_worktree_base_ref: baseRef,
...(targetPath ? { git_worktree_target_path: targetPath } : {}),
},
preferredWorkingDirectory,
preferredWorkingDirectory: worktreePath,
},
release: async () => {
await runGit(["-C", repoRoot, "worktree", "remove", "--force", worktreePath]);
@@ -596,21 +576,6 @@ function readOptionalString(
return value.trim();
}
function readOptionalStringOrUndefined(
options: Record<string, unknown>,
key: string,
): string | undefined {
const value = options[key];
if (value === undefined) {
return undefined;
}
if (typeof value !== "string") {
throw new Error(`Option "${key}" must be a string when provided.`);
}
const trimmed = value.trim();
return trimmed.length > 0 ? trimmed : undefined;
}
function readOptionalInteger(
options: Record<string, unknown>,
key: string,
@@ -630,46 +595,6 @@ function readOptionalInteger(
return value;
}
function normalizeWorktreeTargetPath(value: string | undefined, key: string): string | undefined {
if (!value) {
return undefined;
}
const slashNormalized = value.replace(/\\/g, "/");
if (isAbsolute(slashNormalized) || /^[a-zA-Z]:\//.test(slashNormalized)) {
throw new Error(`Option "${key}" must be a relative path within the repository worktree.`);
}
const normalizedSegments = slashNormalized
.split("/")
.map((segment) => segment.trim())
.filter((segment) => segment.length > 0 && segment !== ".");
if (normalizedSegments.some((segment) => segment === "..")) {
throw new Error(`Option "${key}" must not contain ".." path segments.`);
}
if (normalizedSegments.length === 0) {
return undefined;
}
return normalizedSegments.join("/");
}
async function assertDirectoryExists(path: string, errorMessage: string): Promise<void> {
try {
const stats = await stat(path);
if (!stats.isDirectory()) {
throw new Error(errorMessage);
}
} catch (error) {
if ((error as NodeJS.ErrnoException).code === "ENOENT") {
throw new Error(errorMessage);
}
throw error;
}
}
function readNumberFromAllocation(allocation: Record<string, JsonValue>, key: string): number {
const value = allocation[key];
if (typeof value !== "number" || !Number.isInteger(value)) {
@@ -717,8 +642,6 @@ export function buildChildResourceRequests(input: ChildResourceSuballocationInpu
const parentWorktreePath = readStringFromAllocation(parentGit, "worktreePath");
const baseRefRaw = parentGit.baseRef;
const baseRef = typeof baseRefRaw === "string" && baseRefRaw.trim().length > 0 ? baseRefRaw : "HEAD";
const targetPathRaw = parentGit.targetPath;
const targetPath = typeof targetPathRaw === "string" ? targetPathRaw.trim() : "";
requests.push({
kind: "git-worktree",
@@ -729,7 +652,6 @@ export function buildChildResourceRequests(input: ChildResourceSuballocationInpu
buildScopedName(input.parentSnapshot.sessionId),
),
baseRef,
...(targetPath ? { targetPath } : {}),
},
});
}

View File

@@ -11,7 +11,6 @@ function toProvisioningConfig(input: Readonly<AppConfig>): BuiltInProvisioningCo
gitWorktree: {
rootDirectory: input.provisioning.gitWorktree.rootDirectory,
baseRef: input.provisioning.gitWorktree.baseRef,
targetPath: input.provisioning.gitWorktree.targetPath,
},
portRange: {
basePort: input.provisioning.portRange.basePort,

View File

@@ -124,50 +124,6 @@ function readOptionalString(
return value;
}
function readOptionalRelativePath(
env: NodeJS.ProcessEnv,
key: string,
): string | undefined {
const value = readOptionalString(env, key);
if (!value) {
return undefined;
}
const slashNormalized = value.replace(/\\/g, "/");
if (slashNormalized.startsWith("/") || /^[a-zA-Z]:\//.test(slashNormalized)) {
throw new Error(`Environment variable ${key} must be a relative path.`);
}
const normalizedSegments = slashNormalized
.split("/")
.map((segment) => segment.trim())
.filter((segment) => segment.length > 0 && segment !== ".");
if (normalizedSegments.some((segment) => segment === "..")) {
throw new Error(`Environment variable ${key} must not contain ".." path segments.`);
}
if (normalizedSegments.length === 0) {
return undefined;
}
return normalizedSegments.join("/");
}
function normalizeClaudeModel(value: string | undefined): string | undefined {
if (!value) {
return undefined;
}
const anthropicPrefix = "anthropic/";
if (!value.startsWith(anthropicPrefix)) {
return value;
}
const normalized = value.slice(anthropicPrefix.length).trim();
return normalized || undefined;
}
function readStringWithFallback(
env: NodeJS.ProcessEnv,
key: string,
@@ -356,7 +312,7 @@ export function loadConfig(env: NodeJS.ProcessEnv = process.env): Readonly<AppCo
codexSkipGitCheck: readBooleanWithFallback(env, "CODEX_SKIP_GIT_CHECK", true),
anthropicOauthToken,
anthropicApiKey,
claudeModel: normalizeClaudeModel(readOptionalString(env, "CLAUDE_MODEL")),
claudeModel: readOptionalString(env, "CLAUDE_MODEL"),
claudeCodePath: readOptionalString(env, "CLAUDE_CODE_PATH"),
},
mcp: {
@@ -424,7 +380,6 @@ export function loadConfig(env: NodeJS.ProcessEnv = process.env): Readonly<AppCo
"AGENT_WORKTREE_BASE_REF",
DEFAULT_PROVISIONING.gitWorktree.baseRef,
),
targetPath: readOptionalRelativePath(env, "AGENT_WORKTREE_TARGET_PATH"),
},
portRange: {
basePort: readIntegerWithBounds(

View File

@@ -72,8 +72,6 @@ const CLAUDE_OUTPUT_FORMAT = {
schema: ACTOR_RESPONSE_SCHEMA,
} as const;
const CLAUDE_PROVIDER_MAX_TURNS = 2;
function toErrorMessage(error: unknown): string {
if (error instanceof Error) {
return error.message;
@@ -435,7 +433,7 @@ function buildClaudeOptions(input: {
};
return {
maxTurns: CLAUDE_PROVIDER_MAX_TURNS,
maxTurns: 1,
...(runtime.config.provider.claudeModel
? { model: runtime.config.provider.claudeModel }
: {}),

View File

@@ -120,6 +120,101 @@ const MANIFEST_EVENT_TRIGGERS = [
const RUN_MANIFEST_EDITOR_VALUE = "__editor__";
const RUN_MANIFEST_EDITOR_LABEL = "[Use Manifest Editor JSON]";
const LABEL_HELP_BY_CONTROL = Object.freeze({
"session-select": "Select which session the graph and feed should focus on.",
"graph-manifest-select": "Choose the manifest context used when rendering the selected session graph.",
"run-prompt": "Describe the task objective you want the run to complete.",
"run-manifest-select": "Choose a saved manifest or use the JSON currently in the editor.",
"run-execution-mode": "Use provider for live model execution or mock for simulated execution.",
"run-provider": "Choose which model provider backend handles provider-mode runs.",
"run-topology-hint": "Optional hint that nudges orchestration toward a topology strategy.",
"run-flags": "Optional JSON object passed in as initial run flags.",
"run-validation-nodes": "Optional comma-separated node IDs to simulate validation outcomes for.",
"events-limit": "Set how many recent runtime events are loaded per refresh.",
"cfg-webhook-url": "Webhook endpoint that receives runtime event notifications.",
"cfg-webhook-severity": "Minimum severity level that triggers webhook notifications.",
"cfg-webhook-always": "Event types that should always notify, regardless of severity.",
"cfg-security-mode": "Policy behavior used when a command violates security rules.",
"cfg-security-binaries": "Comma-separated command binaries permitted by policy.",
"cfg-security-timeout": "Maximum command execution time before forced timeout.",
"cfg-security-inherit": "Environment variable names to pass through to subprocesses.",
"cfg-security-scrub": "Environment variable names to strip before command execution.",
"cfg-limit-concurrent": "Maximum number of agents that can run concurrently across sessions.",
"cfg-limit-session": "Maximum number of agents that can run concurrently within a single session.",
"cfg-limit-depth": "Maximum recursive spawn depth allowed for agent tasks.",
"cfg-topology-depth": "Maximum orchestration graph depth permitted by topology rules.",
"cfg-topology-retries": "Maximum retry expansions allowed by topology orchestration.",
"cfg-relationship-children": "Maximum children each persona relationship can spawn.",
"cfg-port-base": "Starting port number for provisioning port allocations.",
"cfg-port-block-size": "Number of ports reserved per allocated block.",
"cfg-port-block-count": "Number of port blocks available for allocation.",
"cfg-port-primary-offset": "Offset within each block used for the primary service port.",
"manifest-path": "Workspace-relative manifest file path to load, validate, or save.",
"helper-topology-sequential": "Allow sequential execution topology in this manifest.",
"helper-topology-parallel": "Allow parallel execution topology in this manifest.",
"helper-topology-hierarchical": "Allow hierarchical parent-child execution topology.",
"helper-topology-retry-unrolled": "Allow retry-unrolled topology for explicit retry paths.",
"helper-topology-max-depth": "Top-level cap on orchestration depth in this manifest.",
"helper-topology-max-retries": "Top-level cap on retry attempts in this manifest.",
"helper-entry-node-id": "Node ID used as the pipeline entry point.",
"helper-persona-id": "Stable persona identifier referenced by nodes and relationships.",
"helper-persona-display-name": "Human-readable persona name shown in summaries and tooling.",
"helper-persona-model-constraint": "Optional model restriction for this persona only.",
"helper-persona-system-prompt": "Base prompt template that defines persona behavior.",
"helper-persona-allowlist": "Comma-separated tool names this persona may use.",
"helper-persona-banlist": "Comma-separated tool names this persona must not use.",
"helper-relationship-parent": "Parent persona ID that can spawn or delegate to the child.",
"helper-relationship-child": "Child persona ID allowed under the selected parent.",
"helper-relationship-max-depth": "Optional override limiting recursion depth for this relationship.",
"helper-relationship-max-children": "Optional override limiting child fan-out for this relationship.",
"helper-node-id": "Unique pipeline node identifier used by entry node and edges.",
"helper-node-actor-id": "Runtime actor identifier assigned to this node.",
"helper-node-persona-id": "Persona applied when this node executes.",
"helper-node-topology-kind": "Optional node-level topology override.",
"helper-node-block-id": "Optional topology block identifier for grouped scheduling logic.",
"helper-node-max-retries": "Optional node-level retry limit override.",
"helper-edge-from": "Source node where this edge starts.",
"helper-edge-to": "Target node activated when this edge condition matches.",
"helper-edge-trigger-kind": "Choose whether edge activation is status-based or event-based.",
"helper-edge-on": "Node status value that triggers this edge when using status mode.",
"helper-edge-event": "Domain event name that triggers this edge when using event mode.",
"helper-edge-when": "Optional JSON array of additional conditions required to follow the edge.",
});
function extractLabelText(label) {
const clone = label.cloneNode(true);
for (const field of clone.querySelectorAll("input, select, textarea")) {
field.remove();
}
return clone.textContent?.replace(/\s+/g, " ").trim() || "this field";
}
function applyLabelTooltips(root = document) {
for (const label of root.querySelectorAll("label")) {
const control = label.querySelector("input,select,textarea");
if (!control) {
continue;
}
let help = LABEL_HELP_BY_CONTROL[control.id] || "";
if (!help) {
for (const className of control.classList) {
help = LABEL_HELP_BY_CONTROL[className] || "";
if (help) {
break;
}
}
}
if (!help) {
const labelText = extractLabelText(label).toLowerCase();
help = `Set ${labelText} for this configuration.`;
}
label.title = help;
control.title = help;
}
}
function fmtMoney(value) {
return `$${Number(value || 0).toFixed(4)}`;
}
@@ -575,6 +670,8 @@ function renderManifestHelper() {
`;
})
.join("");
applyLabelTooltips(dom.manifestForm);
}
function readManifestDraftFromHelper() {
@@ -1737,6 +1834,7 @@ async function refreshAll() {
async function initialize() {
bindUiEvents();
applyLabelTooltips();
try {
await apiRequest("/api/health");

View File

@@ -3,11 +3,7 @@ import { mkdir, readFile, writeFile } from "node:fs/promises";
import { resolve } from "node:path";
import { SchemaDrivenExecutionEngine } from "../agents/orchestration.js";
import { parseAgentManifest, type AgentManifest } from "../agents/manifest.js";
import type {
ActorExecutionResult,
ActorExecutor,
PipelineAggregateStatus,
} from "../agents/pipeline.js";
import type { ActorExecutionResult, ActorExecutor } from "../agents/pipeline.js";
import { loadConfig, type AppConfig } from "../config.js";
import { parseEnvFile } from "./env-store.js";
import {
@@ -47,10 +43,6 @@ export type RunRecord = {
error?: string;
};
function toRunStatus(status: PipelineAggregateStatus): Extract<RunStatus, "success" | "failure"> {
return status === "success" ? "success" : "failure";
}
type ActiveRun = {
controller: AbortController;
record: RunRecord;
@@ -393,7 +385,7 @@ export class UiRunService {
run: record,
});
const summary = await engine.runSession({
await engine.runSession({
sessionId,
initialPayload: {
prompt: input.prompt,
@@ -413,7 +405,7 @@ export class UiRunService {
const next: RunRecord = {
...completedRecord,
status: toRunStatus(summary.status),
status: "success",
endedAt: new Date().toISOString(),
};
this.runHistory.set(runId, next);

49
test_run.md Normal file
View File

@@ -0,0 +1,49 @@
• For this repo, a “test run” can mean 3 different things. Configure based on which one you want.
1. npm run verify (typecheck + tests + build)
- Required:
- Node + npm
- Dependencies installed: npm install
- Not required:
- API keys
- Command:
- npm run verify
2. UI dry run (no external model calls, safest first run)
- Required:
- npm install
- topologies, personas, relationships, topologyConstraints
- pipeline with entryNodeId, nodes, edges
3. Provider-backed run (Codex/Claude via CLI or UI executionMode=provider)
- Required:
- Everything in #2
- Auth for chosen provider:
- Codex/OpenAI: OPENAI_AUTH_MODE + (CODEX_API_KEY or OPENAI_API_KEY) or existing Codex login
- Claude: CLAUDE_CODE_OAUTH_TOKEN (preferred) or ANTHROPIC_API_KEY or existing Claude login
- git available and workspace is a valid git repo (runtime provisions git worktrees)
- Optional:
- mcp.config.json (default missing file is allowed if path is default)
- Important:
- If you set custom MCP_CONFIG_PATH, that file must exist.
Environment groups you can tune (defaults already exist)
- Provider/auth: keys, auth mode, base URL, model, MCP path
- Limits: AGENT_MAX_*, AGENT_TOPOLOGY_*, AGENT_RELATIONSHIP_MAX_CHILDREN
- Provisioning: worktree root/base ref, port range/locks, discovery relative path
- Security: violation mode, allowlisted binaries, timeout, audit path, env inherit/scrub
- Telemetry/UI: runtime event log + Discord settings, AGENT_UI_HOST/PORT
- Do not set runtime-injected vars manually (AGENT_WORKTREE_PATH, AGENT_PORT_RANGE_START, etc.).
Practical first-run sequence
1. npm install
2. cp .env.example .env
3. npm run verify
4. npm run ui
5. Start a run in mock mode with a valid manifest
6. Switch to provider mode after auth is confirmed

View File

@@ -104,26 +104,3 @@ test("resolveOpenAiApiKey prefers CODEX_API_KEY in auto mode", () => {
assert.equal(resolveOpenAiApiKey(config.provider), "codex-key");
});
test("normalizes anthropic-prefixed CLAUDE_MODEL values", () => {
const config = loadConfig({
CLAUDE_MODEL: "anthropic/claude-sonnet-4-6",
});
assert.equal(config.provider.claudeModel, "claude-sonnet-4-6");
});
test("normalizes AGENT_WORKTREE_TARGET_PATH", () => {
const config = loadConfig({
AGENT_WORKTREE_TARGET_PATH: "./src/agents/",
});
assert.equal(config.provisioning.gitWorktree.targetPath, "src/agents");
});
test("validates AGENT_WORKTREE_TARGET_PATH against parent traversal", () => {
assert.throws(
() => loadConfig({ AGENT_WORKTREE_TARGET_PATH: "../secrets" }),
/must not contain "\.\." path segments/,
);
});

View File

@@ -18,7 +18,6 @@ function parentSnapshot(): DiscoverySnapshot {
worktreeRoot: "/repo/.ai_ops/worktrees",
worktreePath: "/repo/.ai_ops/worktrees/parent",
baseRef: "HEAD",
targetPath: "src/agents",
},
},
{
@@ -56,7 +55,6 @@ test("builds deterministic child suballocation requests", () => {
const gitRequest = requests.find((entry) => entry.kind === "git-worktree");
assert.ok(gitRequest);
assert.equal(typeof gitRequest.options?.rootDirectory, "string");
assert.equal(gitRequest.options?.targetPath, "src/agents");
const portRequest = requests.find((entry) => entry.kind === "port-range");
assert.ok(portRequest);

View File

@@ -1,96 +0,0 @@
import test from "node:test";
import assert from "node:assert/strict";
import { mkdtemp, writeFile } from "node:fs/promises";
import { tmpdir } from "node:os";
import { resolve } from "node:path";
import { UiRunService, readRunMetaBySession } from "../src/ui/run-service.js";
async function waitForTerminalRun(
runService: UiRunService,
runId: string,
): Promise<"success" | "failure" | "cancelled"> {
const maxPolls = 100;
for (let index = 0; index < maxPolls; index += 1) {
const run = runService.getRun(runId);
if (run && run.status !== "running") {
return run.status;
}
await new Promise((resolveWait) => setTimeout(resolveWait, 20));
}
throw new Error("Run did not reach a terminal status within polling window.");
}
test("run service persists failure when pipeline summary is failure", async () => {
const workspaceRoot = await mkdtemp(resolve(tmpdir(), "ai-ops-run-service-"));
const stateRoot = resolve(workspaceRoot, "state");
const projectContextPath = resolve(workspaceRoot, "project-context.json");
const envPath = resolve(workspaceRoot, ".env");
await writeFile(
envPath,
[
`AGENT_STATE_ROOT=${stateRoot}`,
`AGENT_PROJECT_CONTEXT_PATH=${projectContextPath}`,
].join("\n"),
"utf8",
);
const runService = new UiRunService({
workspaceRoot,
envFilePath: ".env",
});
const manifest = {
schemaVersion: "1",
topologies: ["sequential"],
personas: [
{
id: "writer",
displayName: "Writer",
systemPromptTemplate: "Write the draft",
toolClearance: {
allowlist: ["read_file", "write_file"],
banlist: [],
},
},
],
relationships: [],
topologyConstraints: {
maxDepth: 1,
maxRetries: 0,
},
pipeline: {
entryNodeId: "write-node",
nodes: [
{
id: "write-node",
actorId: "writer-actor",
personaId: "writer",
topology: {
kind: "sequential",
},
constraints: {
maxRetries: 0,
},
},
],
edges: [],
},
};
const started = await runService.startRun({
prompt: "force validation failure on first attempt",
manifest,
executionMode: "mock",
simulateValidationNodeIds: ["write-node"],
});
const terminalStatus = await waitForTerminalRun(runService, started.runId);
assert.equal(terminalStatus, "failure");
const persisted = await readRunMetaBySession({
stateRoot,
sessionId: started.sessionId,
});
assert.equal(persisted?.status, "failure");
});

View File