ai_ops/docs/session-walkthrough.md

# Session Walkthrough (Concrete Example)

This document walks through one successful provider run end-to-end using:

- session id: `ui-session-mlzw94bv-cb753677`
- run id: `9287775f-a507-492a-9afa-347ed3f3a6b3`
- execution mode: `provider`
- provider: `claude`
- manifest: `.ai_ops/manifests/test.json`

Use this as a mental model and as a debugging template for future sessions.

## 1) What happened in this run

The manifest defines two sequential nodes:

1. `write-node` (persona: writer)
2. `copy-node` (persona: copy-editor)

Edge routing is `write-node -> copy-node` on `success`.

In this run:

1. `write-node` succeeded on attempt 1 and emitted `validation_passed` and `tasks_planned`.
2. `copy-node` succeeded on attempt 1 and emitted `validation_passed`.
3. Session aggregate status was `success`.

## 2) Timeline from runtime events

From `.ai_ops/events/runtime-events.ndjson`:

1. `2026-02-24T00:55:28.632Z` `session.started`
2. `2026-02-24T00:55:48.705Z` `node.attempt.completed` for `write-node` with `status=success`
3. `2026-02-24T00:55:48.706Z` `domain.validation_passed` for `write-node`
4. `2026-02-24T00:55:48.706Z` `domain.tasks_planned` for `write-node`
5. `2026-02-24T00:56:14.237Z` `node.attempt.completed` for `copy-node` with `status=success`
6. `2026-02-24T00:56:14.238Z` `domain.validation_passed` for `copy-node`
7. `2026-02-24T00:56:14.242Z` `session.completed` with `status=success`

## 3) How artifacts map to runtime behavior

### Run metadata (UI-level)

`state/<session>/ui-run-meta.json` stores run summary fields:

- run/provider/mode
- status (`running`, `success`, `failure`, `cancelled`)
- start/end timestamps

For this run:

```json
{
  "sessionId": "ui-session-mlzw94bv-cb753677",
  "status": "success",
  "executionMode": "provider",
  "provider": "claude"
}
```

### Handoffs (node input payloads)

`state/<session>/handoffs/*.json` stores payload handoffs per node.

`write-node.json`:

```json
{
  "nodeId": "write-node",
  "payload": { "prompt": "be yourself" }
}
```

`copy-node.json` includes `fromNodeId: "write-node"` and carries the story generated by the writer node.

Important: this is the payload pipeline edge transfer. If a downstream node output looks strange, inspect this file first.

### Session state (flags + metadata + history)

`state/<session>/state.json` is cumulative session state:

- `flags`: merged boolean flags from node results
- `metadata`: merged metadata from node results/behavior patches
- `history`: domain-event history entries

For this run, state includes:

- flags: `story_written=true`, `copy_edited=true`
- history events:
  - `write-node: validation_passed`
  - `write-node: tasks_planned`
  - `copy-node: validation_passed`

### Project context pointer

`.ai_ops/project-context.json` tracks cross-session pointers like:

- `sessions/<session>/last_completed_node`
- `sessions/<session>/last_attempt`
- `sessions/<session>/final_state`

This lets operators and tooling locate the final state file for any completed session.

## 4) Code path (from button click to persisted state)

1. UI starts run via `UiRunService.startRun(...)`.
2. Service loads config, parses manifest, creates engine, writes initial run meta.
3. Engine `runSession(...)` initializes state and writes entry handoff.
4. Pipeline executes ready nodes:
   - builds fresh node context (`handoff + state`)
   - renders persona system prompt
   - invokes provider executor
   - receives actor result
5. Lifecycle observer persists:
   - state flags/metadata/history
   - runtime events (`node.attempt.completed`, `domain.*`)
   - project context pointers (`last_completed_node`, `last_attempt`)
6. Pipeline evaluates edges and writes downstream handoffs.
7. Pipeline computes aggregate status and emits `session.completed`.
8. UI run service writes final `ui-run-meta.json` status from pipeline summary.

Primary entrypoints:

- `src/ui/run-service.ts`
- `src/agents/orchestration.ts`
- `src/agents/pipeline.ts`
- `src/agents/lifecycle-observer.ts`
- `src/agents/state-context.ts`
- `src/ui/provider-executor.ts`

## 5) Mental model that keeps this manageable

Think of one session as five stores and one loop:

1. Manifest (static plan): node graph + routing rules.
2. Handoffs (per-node input payload snapshots).
3. State (session memory): flags + metadata + domain history.
4. Runtime events (timeline/audit side channel).
5. Project context (cross-session pointers and shared context).
6. Loop: dequeue ready node -> execute -> persist result/events -> enqueue next nodes.

If you track those six things, behavior becomes deterministic and explainable.

## 6) Debug checklist for any future session id

Given `<sid>`, inspect in this order:

1. `state/<sid>/ui-run-meta.json`
2. `.ai_ops/events/runtime-events.ndjson` filtered by `<sid>`
3. `state/<sid>/handoffs/*.json`
4. `state/<sid>/state.json`
5. `.ai_ops/project-context.json` pointer entries for `<sid>`

Interpretation:

1. No `session.started`: run failed before pipeline began.
2. `node.attempt.completed` with `failureCode=provider_*`: provider/runtime issue.
3. Missing downstream handoff file: edge condition did not pass.
4. `history` has `validation_failed`: retry/unrolled path or remediation branch likely triggered.
5. `ui-run-meta` disagrees with runtime events: check run-service status mapping and restart server on new code.