Add configurable worktree target path and session run diagnostics

2026-02-23 20:38:05 -05:00
parent e7dbc9870f
commit 83bbf1a9ce
13 changed files with 434 additions and 7 deletions
--- a/docs/session-walkthrough.md
+++ b/docs/session-walkthrough.md
@@ -0,0 +1,160 @@
+# Session Walkthrough (Concrete Example)
+
+This document walks through one successful provider run end-to-end using:
+
+- session id: `ui-session-mlzw94bv-cb753677`
+- run id: `9287775f-a507-492a-9afa-347ed3f3a6b3`
+- execution mode: `provider`
+- provider: `claude`
+- manifest: `.ai_ops/manifests/test.json`
+
+Use this as a mental model and as a debugging template for future sessions.
+
+## 1) What happened in this run
+
+The manifest defines two sequential nodes:
+
+1. `write-node` (persona: writer)
+2. `copy-node` (persona: copy-editor)
+
+Edge routing is `write-node -> copy-node` on `success`.
+
+In this run:
+
+1. `write-node` succeeded on attempt 1 and emitted `validation_passed` and `tasks_planned`.
+2. `copy-node` succeeded on attempt 1 and emitted `validation_passed`.
+3. Session aggregate status was `success`.
+
+## 2) Timeline from runtime events
+
+From `.ai_ops/events/runtime-events.ndjson`:
+
+1. `2026-02-24T00:55:28.632Z` `session.started`
+2. `2026-02-24T00:55:48.705Z` `node.attempt.completed` for `write-node` with `status=success`
+3. `2026-02-24T00:55:48.706Z` `domain.validation_passed` for `write-node`
+4. `2026-02-24T00:55:48.706Z` `domain.tasks_planned` for `write-node`
+5. `2026-02-24T00:56:14.237Z` `node.attempt.completed` for `copy-node` with `status=success`
+6. `2026-02-24T00:56:14.238Z` `domain.validation_passed` for `copy-node`
+7. `2026-02-24T00:56:14.242Z` `session.completed` with `status=success`
+
+## 3) How artifacts map to runtime behavior
+
+### Run metadata (UI-level)
+
+`state/<session>/ui-run-meta.json` stores run summary fields:
+
+- run/provider/mode
+- status (`running`, `success`, `failure`, `cancelled`)
+- start/end timestamps
+
+For this run:
+
+```json
+{
+  "sessionId": "ui-session-mlzw94bv-cb753677",
+  "status": "success",
+  "executionMode": "provider",
+  "provider": "claude"
+}
+```
+
+### Handoffs (node input payloads)
+
+`state/<session>/handoffs/*.json` stores payload handoffs per node.
+
+`write-node.json`:
+
+```json
+{
+  "nodeId": "write-node",
+  "payload": { "prompt": "be yourself" }
+}
+```
+
+`copy-node.json` includes `fromNodeId: "write-node"` and carries the story generated by the writer node.
+
+Important: this is the payload pipeline edge transfer. If a downstream node output looks strange, inspect this file first.
+
+### Session state (flags + metadata + history)
+
+`state/<session>/state.json` is cumulative session state:
+
+- `flags`: merged boolean flags from node results
+- `metadata`: merged metadata from node results/behavior patches
+- `history`: domain-event history entries
+
+For this run, state includes:
+
+- flags: `story_written=true`, `copy_edited=true`
+- history events:
+  - `write-node: validation_passed`
+  - `write-node: tasks_planned`
+  - `copy-node: validation_passed`
+
+### Project context pointer
+
+`.ai_ops/project-context.json` tracks cross-session pointers like:
+
+- `sessions/<session>/last_completed_node`
+- `sessions/<session>/last_attempt`
+- `sessions/<session>/final_state`
+
+This lets operators and tooling locate the final state file for any completed session.
+
+## 4) Code path (from button click to persisted state)
+
+1. UI starts run via `UiRunService.startRun(...)`.
+2. Service loads config, parses manifest, creates engine, writes initial run meta.
+3. Engine `runSession(...)` initializes state and writes entry handoff.
+4. Pipeline executes ready nodes:
+   - builds fresh node context (`handoff + state`)
+   - renders persona system prompt
+   - invokes provider executor
+   - receives actor result
+5. Lifecycle observer persists:
+   - state flags/metadata/history
+   - runtime events (`node.attempt.completed`, `domain.*`)
+   - project context pointers (`last_completed_node`, `last_attempt`)
+6. Pipeline evaluates edges and writes downstream handoffs.
+7. Pipeline computes aggregate status and emits `session.completed`.
+8. UI run service writes final `ui-run-meta.json` status from pipeline summary.
+
+Primary entrypoints:
+
+- `src/ui/run-service.ts`
+- `src/agents/orchestration.ts`
+- `src/agents/pipeline.ts`
+- `src/agents/lifecycle-observer.ts`
+- `src/agents/state-context.ts`
+- `src/ui/provider-executor.ts`
+
+## 5) Mental model that keeps this manageable
+
+Think of one session as five stores and one loop:
+
+1. Manifest (static plan): node graph + routing rules.
+2. Handoffs (per-node input payload snapshots).
+3. State (session memory): flags + metadata + domain history.
+4. Runtime events (timeline/audit side channel).
+5. Project context (cross-session pointers and shared context).
+6. Loop: dequeue ready node -> execute -> persist result/events -> enqueue next nodes.
+
+If you track those six things, behavior becomes deterministic and explainable.
+
+## 6) Debug checklist for any future session id
+
+Given `<sid>`, inspect in this order:
+
+1. `state/<sid>/ui-run-meta.json`
+2. `.ai_ops/events/runtime-events.ndjson` filtered by `<sid>`
+3. `state/<sid>/handoffs/*.json`
+4. `state/<sid>/state.json`
+5. `.ai_ops/project-context.json` pointer entries for `<sid>`
+
+Interpretation:
+
+1. No `session.started`: run failed before pipeline began.
+2. `node.attempt.completed` with `failureCode=provider_*`: provider/runtime issue.
+3. Missing downstream handoff file: edge condition did not pass.
+4. `history` has `validation_failed`: retry/unrolled path or remediation branch likely triggered.
+5. `ui-run-meta` disagrees with runtime events: check run-service status mapping and restart server on new code.