Skip to content

XIOPro Production Blueprint v5.0

Part 11 — System Review


1. Purpose

This part is the verification gate between design (Parts 1-10) and execution (Parts 12-14).

Before any project moves to implementation, it must pass through System Review to verify:

  • Data schema is complete and consistent
  • All system modules are identified and positioned
  • Risks are identified and mitigated
  • Dependencies are mapped and ordered
  • Data flows are documented
  • Operational checklists exist

This process is reusable -- every XIOPro project passes through it.

1.1 Scope of This Document

Sections 1-5 cover the first half of the System Review:

Section Content
1 Purpose and scope
2 Data Schema verification
3 System Module Index
4 Subject Index
5 Risk Register

The second half (Sections 6-10: Dependency Map, Data Flow Audit, Operational Checklists, Gap Analysis, Review Sign-Off) will follow in a separate document.


2. Data Schema

The canonical schema is at: resources/SCHEMA_walking_skeleton_v4_2.sql

2.1 Entity-Relationship Overview

erDiagram
    %% ── Core Work Graph ──
    projects ||--o{ sprints : contains
    projects ||--o{ tickets : contains
    projects ||--o{ project_agent_bindings : has
    sprints ||--o{ tickets : scopes
    tickets ||--o{ tasks : decomposes_into
    tickets ||--o| tickets : parent_ticket
    tasks ||--o| tasks : parent_task
    tasks ||--o{ activities : generates

    %% ── Discussion & Ideas ──
    projects ||--o{ discussion_threads : hosts
    discussion_threads ||--o| tickets : linked_ticket
    discussion_threads ||--o| tasks : linked_task
    discussion_threads ||--o| sessions : linked_session
    ideas ||--o| topics : classified_by
    ideas ||--o| users : raised_by
    ideas ||--o| tickets : converts_to
    idea_discussion_links }o--|| ideas : links
    idea_discussion_links }o--|| discussion_threads : links

    %% ── Agent System ──
    agent_templates ||--o{ agent_runtimes : instantiates
    agent_runtimes ||--o{ sessions : runs_in
    agent_runtimes ||--o{ activities : executes
    agent_runtimes ||--o| hosts : deployed_on
    agent_runtimes ||--o| agent_runtimes : parent_runtime
    agent_runtimes ||--o| tickets : assigned_ticket
    agent_runtimes ||--o| tasks : assigned_task
    project_agent_bindings }o--|| projects : roster_for

    %% ── Governance ──
    escalation_requests ||--o| agent_runtimes : raised_by
    escalation_requests ||--o| tasks : about
    escalation_requests ||--o| activities : triggered_by
    human_decisions }o--|| escalation_requests : resolves
    human_decisions ||--o| agent_runtimes : applies_to
    human_decisions ||--o| tasks : applies_to
    override_records ||--o| agent_runtimes : targets

    %% ── Knowledge ──
    topics ||--o| topics : parent_topic
    research_tasks ||--o| projects : scoped_to
    research_tasks ||--o| tickets : linked_to
    research_tasks ||--o| tasks : parent_task
    research_tasks ||--o| agent_runtimes : owned_by

    %% ── Cost & Time ──
    cost_ledger }o--|| activities : charges
    cost_ledger ||--o| tasks : attributed_to
    cost_ledger ||--o| tickets : attributed_to
    cost_ledger ||--o| projects : attributed_to
    time_ledger }o--|| activities : records
    time_ledger ||--o| tasks : attributed_to

2.2 Table Summary

# Table Group Key Relationships
0 users Core Standalone identity entity. Referenced by ideas.raised_by_user_id.
1 topics Knowledge Self-referencing tree (parent_topic_id). Referenced by ideas, tickets, tasks, agent_templates, research_tasks via UUID arrays.
2 projects Core Work Graph Parent of sprints, tickets, discussion_threads, project_agent_bindings, research_tasks, cost_ledger.
2A project_agent_bindings Core Work Graph Junction: projects <-> agent identity (agent_id VARCHAR(3)).
3 sprints Core Work Graph FK to projects. Referenced by tickets.sprint_id.
4 discussion_threads Core Work Graph FK to projects. Deferred FKs to tickets, tasks, sessions.
4A ideas Core Work Graph FK to topics, users. Deferred FK to tickets.
4B idea_discussion_links Core Work Graph Junction: ideas <-> discussion_threads.
5 tickets Core Work Graph FK to projects, sprints. Self-referencing (parent_ticket_id). Parent of tasks.
6 tasks Core Work Graph FK to tickets. Self-referencing (parent_task_id). Deferred FK to agent_runtimes. Parent of activities.
7 hosts Agent System Standalone capacity entity. Referenced by agent_runtimes.host_id.
8 agent_templates Agent System Canonical agent class. Parent of agent_runtimes.
9 agent_runtimes Agent System FK to agent_templates, hosts, tickets, tasks. Self-referencing (parent, root, orchestrator). Deferred FK to sessions.
10 sessions Agent System FK to agent_runtimes. Referenced by activities, discussion_threads, escalation_requests.
11 activities Agent System FK to tasks, agent_runtimes, sessions. Parent of cost_ledger, time_ledger.
12 escalation_requests Governance FK to agent_runtimes, sessions, tickets, tasks, activities. Parent of human_decisions.
13 human_decisions Governance FK to escalation_requests, agent_runtimes, tasks.
14 override_records Governance Polymorphic scope (scope_type + scope_ref). Append-only audit trail.
15 cost_ledger Cost/Time FK to activities, agent_runtimes, tasks, tickets, projects.
16 time_ledger Cost/Time FK to activities, agent_runtimes, tasks, tickets.
17 research_tasks Knowledge FK to projects, tickets, tasks, agent_runtimes.

2.3 Schema Statistics

Metric Count
Tables 21
Enum types 28
Deferred foreign keys 6
Auto-update triggers 12
Partial indexes 2
Generated columns (status_state) 10

2.4 Schema Conventions

All tables follow the ODM Metadata Contract (ODM Section 12.2):

  • tags TEXT[] -- free-form classification
  • labels TEXT[] -- structured labels
  • source_system TEXT -- originating system
  • source_ref TEXT -- external reference
  • correlation_id TEXT -- cross-system tracing
  • idempotency_key TEXT -- replay protection
  • notes TEXT -- human-readable notes
  • created_by TEXT, updated_by TEXT -- audit trail
  • created_at TIMESTAMPTZ, updated_at TIMESTAMPTZ -- timestamps

All lifecycle-bearing entities use the three-dimensional state model:

  • status (devxio_status) -- workflow phase
  • state (devxio_state) -- runtime condition
  • status_state -- generated composite for indexing

3. System Module Index

Every distinct capability, engine, or module in XIOPro, with its primary blueprint location, T1P posture, and dependencies.

# Module Primary Part Description T1P Posture Depends On
1 Control Bus Part 2, 5.8 Central messaging relay for all XIOPro surfaces. SSE push, intervention model, message routing between brains and services. Full PostgreSQL, API Service
2 Orchestrator Part 4, 4.1 Master execution coordinator. Decomposes tickets into tasks, assigns agents, manages execution flow and dependencies. Full Control Bus, Work Graph, Agent Templates
3 Governor Part 7, 5 Runtime governance engine. Enforces budgets, breakers, escalation policies, health monitoring, and cost controls. Full Control Bus, Cost Ledger, Orchestrator
4 Rule Steward Part 4, 4.2A Manages the lifecycle of all operational rules: creation, validation, versioning, conflict detection, and retirement. Scaffold Governor, Knowledge Store
5 Prompt Steward Part 4, 4.2B Manages context assembly, prompting modes, question budgets, and prompt package contracts for agent interactions. Scaffold Rule Steward, Skill Registry
6 Module Steward Part 4, 4.2C Evaluates, adopts, and governs external modules and tools. Manages the module portfolio lifecycle. Scaffold Governor, Research Center
7 Librarian Part 5, 4 Core knowledge management system. Ingestion, indexing, search, decomposition, and document lifecycle. Scaffold PostgreSQL, pgvector, Object Storage
8 Research Center Part 5, 8 Operational research engine. Source scouting, scheduled research tasks, digest generation, NotebookLM/Obsidian integration. Scaffold Librarian, Source Registry, Scheduler
9 Skill Registry Part 5, 8.9 Central registry of all agent skills. Defines skill metadata, versioning, model compatibility, and governance rules. Scaffold Rule Steward, Knowledge Store
10 Skill Performance DB Part 5, 8.9A Tracks token consumption, quality scores, model compatibility, and execution statistics per skill. Scaffold Skill Registry, Cost Ledger
11 Hindsight Part 5, 9 Post-execution learning engine. Analyzes completed tasks, extracts patterns, generates improvement recommendations. Scaffold Activities, Sessions, Knowledge Store
12 Dream Engine Part 5, 10 Autonomous optimization engine. Identifies improvement opportunities, proposes experiments, runs during idle time. Idle Maintenance Only Hindsight, Skill Performance DB, Governor
13 Idle Maintenance Part 4, 4.9.9 T1P subset of Dream Engine. Practical optimization tasks: skill drift detection, cost anomaly review, stale knowledge cleanup. Phase 2 (downgraded -- no dedicated ticket) Scheduler, Skill Registry, Cost Ledger
14 RAG Pipeline Part 5, 7.18 Retrieval-augmented generation pipeline. Embedding, chunking, hybrid retrieval, reranking, and context injection. Phase 2 (downgraded -- pgvector DDL deploys but pipeline activation is Phase 2) pgvector, PostgreSQL, Prompt Steward
15 XIOPro Optimizer Part 1, 8A Umbrella capability grouping Governor, Rule Steward, Prompt Steward, Module Steward, and Dream Engine as the self-improvement loop. Scaffold Governor, All Stewards, Dream Engine
16 Control Center UI Part 6 Widget-based web UI. Attention queue, brain interaction, prompt composer, governance dashboards, research desk. First Wave API Service, SSE Push, Control Bus
17 Prompt Composer Part 6, 12 UI component for structured prompt construction. Mode selection, search/research toggle, style controls, module/model controls. First Wave Prompt Steward, UI Framework
18 Agent Spawning Part 4, 5A Agent lifecycle management. Three patterns: roster agent, on-demand agent, ephemeral sub-agent. Capacity-aware host placement. Full Orchestrator, Host Registry, Agent Templates
19 ODM (Operational Domain Model) Part 3 Canonical data model. 21 tables, three-dimensional state model, metadata contract, entity lifecycle rules. Full PostgreSQL
20 Knowledge Ledger Part 5, 4.7 Change and evolution log for all knowledge objects. Tracks document lifecycle, revival, export, and drift. Scaffold Librarian, PostgreSQL
21 Execution Report Part 4, 20 Post-execution summary generation. Cost, duration, outcome, and success criteria assessment per ticket. Scaffold Activities, Cost Ledger, Time Ledger
22 Host Registry Part 3, 4.1B Fleet machine inventory. Tracks capacity (CPU, RAM, SSD, GPU), active agents, and health status per host. Full PostgreSQL
23 Source Registry Part 5, 8.10.1 Curated list of external research sources. Ranked, scheduled, with trust and freshness metadata. Scaffold Research Center, Librarian
24 Resource Registry Part 5, 8.10.2 Registry of evaluated external resources (tools, libraries, services). Lifecycle tracking from discovery to adoption or rejection. Scaffold Research Center, Module Steward
25 Scheduler Part 8, 8.7 Background job execution. Cron-like scheduling for research tasks, idle maintenance, health checks, and refresh cycles. Phase 2 (downgraded -- existing cron covers basics, dedicated scheduler is Phase 2) PostgreSQL, Control Bus

4. Subject Index

Alphabetical index of key subjects referenced across the blueprint.

Subject Primary Location Also Referenced In
Agent Allocation Part 3, 4.2.1 Part 4, 5A
Agent Identity (3-digit) Part 1, 8.1 Part 3, 4.7-4.8; Part 4, 19.1
Agent Lifecycle Part 4, 6 Part 7, 6.2
Agent Runtime Part 3, 4.8 Part 4, 5A; Part 8, 7.2
Agent Template Part 3, 4.7 Part 4, 4.1; Part 8, 8.3
Alerts Part 7, 10 Part 8, 12.6
Approval Part 7, 6.5 Part 3, 4.11; Part 4, 11
Atomic Writes Part 1, 4.3 Part 3, 2.5; Part 8, 3.2
Authentication Part 2, 5.14 Part 3, 4.8 (auth_method); Part 8, 11
Backup Part 2, 5.15 Part 8, 10; Part 4, 16
Breakers (Circuit) Part 7, 9 Part 1, 4.5; Part 4, 10
CLI Tools Part 2, 5.12 Part 1, 4.13; Part 4, 13
Completion Self-Check Part 4, 5.2 Part 7, 6.4
Confidence Scoring Part 4, 4.2B Part 5, 9; Part 7, 11
Control Bus Part 2, 5.8 Part 1, 4.12; Part 7, 7; Part 8, 7.1
Cost Awareness Part 1, 4.6 Part 3, 4.6.2; Part 7, 6.1; Part 8, 13
Cost Ledger Part 3, 4.6.2 Part 4, 9; Part 7, 6.1; Part 8, 13.3
Data Access Rule Part 2, 5.8 Part 8, 7.1
Debounce Part 7, 9.2 Part 4, 10
Decomposition (Task) Part 4, 5 Part 3, 4.4-4.5
Decomposition (Document) Part 5, 4.5 Part 5, 4.1
Dependencies (Task) Part 4, 5.1 Part 3, 4.5
Discussion Thread Part 3, 4.3A Part 5, 4; Part 6, 10.3
Dream Engine Part 5, 10 Part 1, 12A.2; Part 4, 4.9.9; Part 5, 11A.4
Escalation Part 3, 4.11 Part 4, 11; Part 7, 8.3; Part 6, 10.2
Execution Mode Part 3, 4.5 Part 4, 3
Execution Report Part 4, 20 Part 7, 6.4
Firewall Part 8, 11.4 Part 8, 11.10.5
Governor Part 7, 5 Part 1, 8.4; Part 4, 4.2; Part 8, 8.4
Hindsight Part 5, 9 Part 1, 12A.2; Part 4, 4.9.9; Part 5, 11A.3
Host Part 3, 4.1B Part 4, 14; Part 8, 5
Human Decision Part 3, 4.12 Part 7, 6.5; Part 4, 11
Idea Part 3, 4.3B Part 5, 4; Part 6, 10.3
Idle Maintenance Part 4, 4.9.9 Part 1, 12A.2; Part 5, 10; Part 5, 11A.4
Intervention Part 7, 10.4 Part 2, 5.8; Part 6, 10.2
Knowledge Compounding Part 1, 4.7 Part 5, 2; Part 5, 14
Knowledge Ledger Part 5, 4.7 Part 7, 12
Librarian Part 5, 4 Part 1, 6.1; Part 8, 8.9
LiteLLM Part 2, 5.3 Part 3, 4.8; Part 8, 8.6
Memory Principles Part 5, 4.5A Part 5, 9.5
Metadata Contract Part 3, 12.2 All entity definitions
Module Steward Part 4, 4.2C Part 1, 8.4; Part 7, 12.9; Part 8, 8.12
NotebookLM Part 5, 8.7 Part 5, 8.2A
Obsidian Part 5, 8.8 Part 5, 18.3
ODM (Operational Domain Model) Part 3 Part 1, 7; Part 2, 4.6
Optimizer (XIOPro) Part 1, 8A Part 4, 4.2-4.2C; Part 5, 10
Orchestrator Part 4, 4.1 Part 1, 8.2; Part 2, 4.3; Part 8, 8.3
Override Record Part 3, 4.12A Part 7, 12.14
Paperclip Part 1, 13 Part 8, 15
pgvector Part 5, 7.18 Part 5, 12; Part 8, 8.8
Policy Objects Part 7, 8 Part 7, 6
PostgreSQL Part 2, 5.5 Part 8, 8.8; Part 3 (all entities)
Priority Level Part 3 (enum) Part 4, 8; Part 7, 10.1
Prompt Composer Part 6, 12 Part 4, 4.2B
Prompt Steward Part 4, 4.2B Part 1, 8.4; Part 7, 12.7
RAG Pipeline Part 5, 7.18 Part 4, 4.2B; Part 5, 12
Recovery Part 7, 8.4 Part 4, 15; Part 8, 3.5; Part 8, 11.10
Replaceability Part 1, 4.8 Part 8, 3.3
Research Center Part 5, 8 Part 1, 12A.2; Part 5, 11A.2
Research Task Part 3, 4.12B Part 5, 8.12-8.15
Review Gates Part 7, 12.16 Part 4, 5.2
Roles (Agent) Part 1, 8.2 Part 3, 4.7; Part 4, 4.1-4.2C
Ruflo Part 2, 5.8 Part 4, 4.2F; Part 8, 8.5
Rule Steward Part 4, 4.2A Part 1, 8.4; Part 7, 12
Scheduled Research Part 5, 8.12 Part 5, 11; Part 8, 8.7
Secrets Management Part 8, 11.5 Part 2, 5.14
Self-Evaluation Part 4, 5.2 Part 5, 9
Session Part 3, 4.10 Part 4, 7; Part 8, 7.2
Skill Performance Part 5, 8.9A Part 4, 4.11; Part 5, 10
Skill Registry Part 5, 8.9 Part 4, 4.10-4.11
Skill Selection Part 4, 4.11 Part 5, 8.9
Source Registry Part 5, 8.10.1 Part 5, 8.11
Sprint Part 3, 4.3 Part 4, 5
SSE Push Part 2, 5.6 Part 6, 6.5; Part 8, 7.1
Sub-Agent Part 4, 5A.2 Part 4, 12
T1P Posture Part 1, 12A All Parts (posture tables)
Tailscale Part 8, 5.1 Part 8, 11.4
Three-Dimensional State Part 3, 2.5 Part 3 (all lifecycle entities)
Ticket Part 3, 4.4 Part 4, 5; Part 7, 6
Ticket Numbering Part 3, 2.7 Part 4, 5
Time Ledger Part 3, 4.6.3 Part 4, 9; Part 8, 13.3
Token Budget Part 4, 4.2B Part 7, 6.1; Part 5, 7.18
Topic Part 3, 4.1 Part 5, 4; Part 5, 8
Topic Enrichment Part 3, 4.1.1 Part 5, 4
User Part 3, 4.0 Part 6, 9; Part 8, 11.3
Walking Skeleton Part 3 Part 10
Widget Part 6, 6 Part 6, 10-11

5. Risk Register

Risks identified across Parts 1-8, compiled with severity assessment and mitigation strategy.

5.1 Severity Scale

Level Meaning
Critical System-wide failure or data loss. Requires immediate response.
High Major capability degraded. Requires response within hours.
Medium Partial degradation. Requires response within 1 business day.
Low Minor inconvenience. Addressed in normal maintenance cycle.

5.2 Risk Table

# Risk Severity Likelihood Impact Mitigation BP Reference
R01 RAM exhaustion on Hetzner CPX62 -- 30 GB shared across PostgreSQL, API, Orchestrator, LiteLLM, and all agent runtimes. A spike in concurrent agents or a memory leak crashes the control plane. Critical Medium Full system outage Memory pressure survival rule (Part 8, 11.10.3). Reserved RAM budgets per service. Governor enforces max concurrent agents via host capacity tracking. Core-first recovery order defined. Part 8, 5.1; Part 8, 11.10.3
R02 Scope creep beyond T1P -- Premature implementation of full Dream Engine, full Steward roles, or advanced UI features before Walking Skeleton is stable. High High Wasted budget, unstable foundation T1P Posture classification (Part 1, 12A). Each capability has explicit posture: Full, Scaffold, Defer. Posture violation requires explicit approval. Part 1, 12A
R03 Single orchestrator bottleneck -- One master orchestrator (O00) manages all execution flow. If it crashes or becomes overloaded, all work halts. High Medium Complete execution stoppage 3-failure circuit breaker halts and interrupts C0 (CLAUDE.md). Session durability allows restart. Recovery policy (Part 7, 8.4) defines restart sequence. Future: multi-orchestrator with leader election. Part 4, 4.1; Part 7, 8.4
R04 API rate limits from LLM providers -- Anthropic, OpenAI, or other providers throttle or reject requests during peak load or quota exhaustion. High Medium Agent execution stalls LiteLLM router with fallback model routing (Part 8, 8.6). Governor monitors cost ledger and enforces budget policies (Part 7, 8.1). Token budget management by Prompt Steward. Part 2, 5.3; Part 8, 8.6
R05 Session crash with context loss -- An agent runtime crashes mid-task and the session context (conversation history, intermediate results) is lost. High Medium Rework, duplicated cost Durable session model with checkpoint_ref and transcript_ref (Part 3, 4.10). Atomic writes to PostgreSQL. Recovery policy restores from last checkpoint. Part 3, 4.10; Part 4, 15
R06 Knowledge drift -- Knowledge base becomes stale as external sources change, internal documents are not refreshed, and embeddings decay in relevance. Medium High Degraded RAG quality, incorrect agent behavior Scheduled research refresh cycles (Part 5, 11). Knowledge Ledger tracks document lifecycle (Part 5, 4.7). Anti-entropy rules (Part 5, 15). Idle Maintenance detects stale knowledge. Part 5, 11; Part 5, 15
R07 Cost overrun exceeding Max20 budget -- Uncontrolled LLM usage, excessive agent spawning, or inefficient prompting pushes monthly costs beyond the $200/month ceiling. Critical Medium Budget breach, forced shutdown Governor cost governance (Part 7, 6.1). Budget policy with hard caps (Part 7, 8.1). Cost ledger attribution to activity level (Part 3, 4.6.2). Cost optimization layer (Part 4, 9). Cost reporting on every deliverable (CLAUDE.md). Part 7, 6.1; Part 8, 13
R08 Skill degradation over time -- Skills that worked well initially degrade as models are updated, contexts change, or upstream dependencies shift. Medium Medium Reduced execution quality Skill Performance DB tracks quality per skill over time (Part 5, 8.9A). Idle Maintenance detects skill drift (Part 4, 4.9.9). Dream Engine proposes improvements. Part 5, 8.9A; Part 4, 4.9.9
R09 Security breach via exposed secrets -- API keys, OAuth tokens, or database credentials leaked through logs, commits, or misconfigured services. Critical Low Full system compromise SOPS for secrets at rest (Part 2, 5.14). No secrets in commits (CLAUDE.md). Tailscale VPN for network isolation (Part 8, 11.4). Security logging and audit (Part 8, 11.8). Part 2, 5.14; Part 8, 11.5
R10 Data loss from PostgreSQL failure -- Database corruption, disk failure, or accidental deletion destroys the canonical state store. Critical Low Total state loss Restic backup to Backblaze B2 daily at 03:00 UTC. WAL archiving. Restore drill requirements (Part 8, 10.8). Backup verification on schedule. Part 8, 10; Part 2, 5.15
R11 Agent behavioral drift -- Agents gradually deviate from intended behavior due to prompt template changes, context pollution, or model updates without testing. Medium Medium Unpredictable execution, governance violations Rule Steward validates rule changes (Part 4, 4.2A). Review gates for non-code outputs (Part 7, 12.16). Prompt Steward manages prompt package contracts (Part 4, 4.2B). Version check for agent runtime currency (Part 1, 4.11). Part 4, 4.2A; Part 7, 12.16
R12 Dependency deadlock -- Circular or unresolvable task dependencies prevent execution progress. Medium Low Execution stall on affected ticket Task dependency resolution (Part 4, 5.1). DAG validation at decomposition time. Orchestrator detects cycles before scheduling. Governor breaker triggers on stall detection. Part 4, 5.1
R13 Hetzner outage or network partition -- Cloud provider outage or Tailscale VPN disruption disconnects the control plane from local operator node or external services. High Low Partial or full system unavailability Emergency access layers (Part 8, 11.10.2). Out-of-band recovery via direct SSH. Mac Studio (Node B) can operate independently for local tasks. Health model detects degradation (Part 8, 12.5). Part 8, 11.10; Part 8, 5
R14 Max20 throttling under growth -- As XIOPro manages more projects, the fixed infrastructure budget prevents scaling compute to match workload. Medium Medium Slower execution, queuing delays Scale-up triggers defined (Part 8, 13.5). Hetzner upgrade policy (Part 8, 13.6). Self-hosted model decision rule (Part 8, 13.7). Cost optimization prioritizes high-value work first. Part 8, 13.5-13.7
R15 Context window limits -- Large tasks, deep conversation histories, or excessive RAG injection exceed the model's context window, causing truncation or degraded output. Medium High Reduced output quality, missed context Prompt Steward manages total context budget (Part 4, 4.2B). RAG pipeline respects context window ceiling (Part 5, 7.18). Document decomposition protocol (Part 5, 4.5). Session checkpointing allows context rotation. Part 4, 4.2B; Part 5, 7.18
R16 Orphaned agent runtimes -- Agent processes that lose their parent orchestrator connection continue running, consuming resources without producing useful work. Medium Medium RAM waste, potential interference Heartbeat monitoring (agent_runtimes.last_heartbeat_at). Governor health governance (Part 7, 6.2). Stale heartbeat triggers cleanup. Max20 budget pressure naturally limits orphan lifetime. Part 7, 6.2; Part 3, 4.8
R17 Escalation queue overflow -- Too many escalation requests accumulate without human response, blocking agent execution across multiple tasks. Medium Medium Execution throughput collapse Attention queue in UI (Part 6, 10.1). Escalation urgency levels with routing rules (Part 7, 8.3). Timeout policies auto-resolve low-priority escalations. Governor monitors queue depth. Part 7, 8.3; Part 6, 10.1
R18 Schema migration failure -- Alembic migration fails mid-apply, leaving the database in an inconsistent state between schema versions. High Low Service startup failure, data corruption Alembic revision chain (schema header). Pre-migration backup. Atomic transaction per migration. Rollback script for each migration. Restore drill validates migration reversibility. Part 8, 10; Part 2, 5.5
R19 Provider lock-in despite independence goal -- Gradual accumulation of Anthropic-specific features or prompt patterns makes switching to other providers costly. Medium Medium Reduced negotiating power, migration cost Provider independence constraint (Part 1, 4.1). LiteLLM abstraction layer (Part 8, 8.6). Skill Performance DB tracks per-model compatibility (Part 5, 8.9A). All prompts stored as portable text. Part 1, 4.1; Part 8, 8.6
R20 Insufficient observability during early operation -- Without proper logging, metrics, and dashboards, problems are detected too late and root cause analysis is difficult. Medium Medium Slow incident response, repeated failures Observability stack requirement (Part 8, 12). Required signals defined (Part 8, 12.2). Health model (Part 8, 12.5). Alerting baseline with critical/warning/info tiers (Part 8, 12.6). Dashboard requirements (Part 8, 12.7). Part 8, 12

5.3 Risk Heat Map

                    Low          Medium       High
                    Likelihood   Likelihood   Likelihood
                   +-----------+-----------+-----------+
    Critical       |  R09 R10  |  R01 R07  |           |
                   +-----------+-----------+-----------+
    High           |  R13 R18  |  R03 R04  |  R02      |
                   |           |  R05      |           |
                   +-----------+-----------+-----------+
    Medium         |  R12      |  R08 R11  |  R06 R15  |
                   |           |  R14 R16  |           |
                   |           |  R17 R19  |           |
                   |           |  R20      |           |
                   +-----------+-----------+-----------+

5.4 Top 5 Risks Requiring Immediate Attention

  1. R07 -- Cost overrun: The Max20 budget is a hard constraint. Governor cost governance and per-activity attribution must be operational from day one.
  2. R01 -- RAM exhaustion: With 30 GB serving the entire stack, memory budgets per service must be defined and enforced before first deployment.
  3. R02 -- Scope creep: T1P posture classification exists but requires discipline. Every implementation decision must reference the posture table.
  4. R03 -- Single orchestrator: No redundancy for the master orchestrator. Session durability and recovery policy are the primary mitigations until multi-orchestrator is feasible.
  5. R15 -- Context window limits: High likelihood in daily operation. Prompt Steward context budget management and RAG chunking strategy must be validated early.

Changelog

Version Date Author Changes
4.2.0 2026-03-29 BM Initial draft. Sections 1-5: Purpose, Data Schema, System Module Index, Subject Index, Risk Register.

6. Dependency Order

6.1 Ticket Dependency Graph

flowchart TD
    subgraph EPIC-CB ["EPIC-CB: Control Bus"]
        T1001["TKT-1001<br/>SSE Push Channels"]
        T1002["TKT-1002<br/>Agent Registration"]
        T1003["TKT-1003<br/>Intervention Endpoints"]
        T1004["TKT-1004<br/>Task Orchestration"]
        T1005["TKT-1005<br/>Host Capacity"]
        T1006["TKT-1006<br/>Agent Spawn"]
        T1007["TKT-1007<br/>Cost Tracking"]
        T1008["TKT-1008<br/>Governance Events"]
    end

    subgraph EPIC-ODM ["EPIC-ODM: Schema + Skeleton"]
        T1010["TKT-1010<br/>Deploy DDL"]
        T1011["TKT-1011<br/>Walking Skeleton"]
        T1012["TKT-1012<br/>Seed Data"]
    end

    subgraph EPIC-GOV ["EPIC-GOV: Governance"]
        T1020["TKT-1020<br/>Escalation Path"]
        T1021["TKT-1021<br/>Approval Workflow"]
        T1022["TKT-1022<br/>Alerts + Breakers"]
        T1023["TKT-1023<br/>Override Records"]
    end

    subgraph EPIC-UI ["EPIC-UI: Control Center"]
        T1030["TKT-1030<br/>UI Shell"]
        T1031["TKT-1031<br/>Agent Status Grid"]
        T1032["TKT-1032<br/>Task Board"]
        T1033["TKT-1033<br/>Alerts Panel"]
        T1034["TKT-1034<br/>Cost Summary"]
        T1035["TKT-1035<br/>Prompt Composer"]
        T1036["TKT-1036<br/>Activity Feed"]
    end

    subgraph EPIC-KNO ["EPIC-KNO: Knowledge System"]
        T1040["TKT-1040<br/>Skill Registry"]
        T1041["TKT-1041<br/>Activation Slimming"]
        T1042["TKT-1042<br/>Librarian Decomposition"]
        T1043["TKT-1043<br/>Source Registry"]
    end

    subgraph EPIC-INFRA ["EPIC-INFRA: Infrastructure"]
        T1050["TKT-1050<br/>Stop Unused Services"]
        T1051["TKT-1051<br/>Install Remaining CLI"]
        T1052["TKT-1052<br/>Paperclip Migration"]
        T1053["TKT-1053<br/>Dashboard Transition"]
    end

    subgraph EPIC-TEST ["EPIC-TEST: Testing"]
        T1060["TKT-1060<br/>pytest Setup"]
        T1061["TKT-1061<br/>Playwright Setup"]
        T1062["TKT-1062<br/>Behavioral Tests"]
        T1063["TKT-1063<br/>Acceptance Tests (4)"]
    end

    subgraph EPIC-MVP1 ["EPIC-MVP1: MVP1 Prep (see MVP1_PRODUCT_SPEC.md)"]
        T1070["TKT-1070<br/>Product Engine Integration"]
        T1071["TKT-1071<br/>Billing Webhooks"]
        T1072["TKT-1072<br/>Landing Page Reqs"]
    end

    %% ODM dependencies
    T1010 --> T1011
    T1010 --> T1012
    T1010 --> T1043
    T1010 --> T1060

    %% Walking skeleton dependencies
    T1004 --> T1011
    T1012 --> T1011

    %% CB internal dependencies
    T1001 --> T1003
    T1001 --> T1008
    T1002 --> T1004
    T1002 --> T1005
    T1004 --> T1005
    T1005 --> T1006
    T1004 --> T1007

    %% Governance dependencies
    T1004 --> T1020
    T1010 --> T1020
    T1020 --> T1021
    T1008 --> T1021
    T1008 --> T1022
    T1020 --> T1023
    T1021 --> T1023

    %% UI dependencies
    T1030 --> T1031
    T1030 --> T1032
    T1030 --> T1033
    T1030 --> T1034
    T1030 --> T1035
    T1030 --> T1036
    T1002 --> T1031
    T1004 --> T1032
    T1008 --> T1033
    T1007 --> T1034
    T1001 --> T1035
    T1004 --> T1036

    %% Knowledge dependencies
    T1040 --> T1041

    %% Infrastructure dependencies
    T1011 --> T1052
    T1030 --> T1053

    %% Test dependencies
    T1030 --> T1061
    T1011 --> T1062
    T1060 --> T1062
    T1011 --> T1063
    T1020 --> T1063
    T1060 --> T1063

    %% MVP1 dependencies
    T1011 --> T1070
    T1051 --> T1071

    %% Styling: critical path in bold
    style T1010 fill:#e74c3c,color:#fff,stroke:#c0392b
    style T1004 fill:#e74c3c,color:#fff,stroke:#c0392b
    style T1011 fill:#e74c3c,color:#fff,stroke:#c0392b
    style T1020 fill:#e74c3c,color:#fff,stroke:#c0392b
    style T1063 fill:#e74c3c,color:#fff,stroke:#c0392b
    style T1060 fill:#e74c3c,color:#fff,stroke:#c0392b

6.2 Critical Path

The critical path is the longest chain of dependent tickets that determines the minimum build time. Two paths tie for longest:

Path A -- Schema to Acceptance (longest)

TKT-1010 (DDL, 0.5d)
  -> TKT-1012 (Seed, 0.5d)
    -> TKT-1011 (Skeleton, 3d)
      -> TKT-1063 (Acceptance Tests, 2d)
        = 6.0 days minimum

But TKT-1011 also depends on TKT-1004, which depends on TKT-1002. Factoring in the CB chain:

Path B -- Bus to Acceptance (true critical path)

TKT-1002 (Agent Registration, ~2d)
  -> TKT-1004 (Task Orchestration, ~2d)
    -> TKT-1011 (Walking Skeleton, 3d)
      -> TKT-1020 (Escalation, 2d)
        -> TKT-1063 (Acceptance Tests, 2d)
          = 11.0 days minimum

Path C -- Bus to Governance

TKT-1002 (Registration)
  -> TKT-1004 (Tasks)
    -> TKT-1020 (Escalation)
      -> TKT-1021 (Approval)
        -> TKT-1023 (Overrides)
          = ~8.0 days

The true critical path runs through Path B: from agent registration through task orchestration, the walking skeleton, escalation, and finally the acceptance tests. This chain spans all 5 phases and cannot be shortened without reducing scope.

6.3 Parallel Execution Opportunities

The following groups of tickets have no mutual dependencies and can execute simultaneously:

Phase 1 parallel lanes (Days 2-5):

Lane Tickets Assignee
Lane 1: Bus Core TKT-1001, TKT-1002, TKT-1003, TKT-1004 Engineering Brain
Lane 2: Schema TKT-1010, TKT-1012 Engineering Brain
Lane 3: Infrastructure TKT-1050 DevOps / BrainMaster
Lane 4: Knowledge TKT-1040 BrainMaster

Note: Lanes 1 and 2 share the Engineering Brain assignee, so true parallelism requires either two engineering agents or interleaving.

Phase 2 parallel lanes (Days 4-7):

Lane Tickets Assignee
Lane 1: Governance TKT-1020, TKT-1021, TKT-1022, TKT-1023 Engineering Brain
Lane 2: Bus Extended TKT-1005, TKT-1006, TKT-1007, TKT-1008 Engineering Brain
Lane 3: Knowledge TKT-1041, TKT-1043 BrainMaster
Lane 4: Tools TKT-1051 DevOps

Phase 3 parallel lanes (Days 6-10):

Lane Tickets Assignee
Lane 1: UI Shell + Widgets TKT-1030 then TKT-1031-1036 (all 6 widgets parallel after shell) Brand Brain
Lane 2: E2E Setup TKT-1061 (after TKT-1030) Engineering Brain

Phase 4-5 parallel lanes (Days 8-14):

Lane Tickets Assignee
Lane 1: Migration TKT-1052, TKT-1053 Engineering / BM
Lane 2: Knowledge TKT-1042 Mac Worker
Lane 3: MVP1 (see MVP1_PRODUCT_SPEC.md) TKT-1070, TKT-1071, TKT-1072 Engineering / Brand
Lane 4: Testing TKT-1062, TKT-1063 Engineering Brain

Maximum parallelism: With 3 agents working simultaneously (Engineering, Brand, BrainMaster), theoretical build time compresses from ~40 ticket-days to approximately 14 calendar days.


7. Data Flow Diagrams

7.1 Task Lifecycle

flowchart LR
    A["Idea<br/>(conversation)"] --> B["Discussion Thread<br/>(type: intake)"]
    B --> C["Ticket<br/>(state: open)"]
    C --> D["Task<br/>(state: queued)"]
    D --> E["Agent Assignment<br/>(task.assigned_to)"]
    E --> F["Session<br/>(agent execution context)"]
    F --> G["Activity<br/>(work unit)"]
    G --> H["Result<br/>(activity_evaluations)"]
    H --> I{"Success?"}
    I -->|yes| J["Knowledge Object<br/>(if applicable)"]
    I -->|no| K["Retry / Escalate"]
    K -->|retry| D
    K -->|escalate| L["Escalation<br/>(human decision)"]
    L --> D
    J --> M["Reflection<br/>(hindsight evaluation)"]
    M --> N["Knowledge Update<br/>(vault + pgvector)"]

    style A fill:#3498db,color:#fff
    style C fill:#2ecc71,color:#fff
    style G fill:#f39c12,color:#fff
    style J fill:#9b59b6,color:#fff
    style N fill:#1abc9c,color:#fff

7.2 Agent Communication

flowchart TB
    subgraph Agents ["Agent Layer"]
        A0["000<br/>Orchestrator"]
        A1["001<br/>Governor"]
        A2["002<br/>Engineering"]
        A3["003<br/>Brand"]
        A10["010<br/>Mac Worker"]
    end

    subgraph Bus ["Control Bus (REST + SSE)"]
        direction TB
        REST["REST API<br/>POST /tasks<br/>POST /messages<br/>POST /escalations<br/>GET /agents"]
        SSE["SSE Push<br/>/events/agent/{id}<br/>/events/founder<br/>/events/ui"]
        HB["Heartbeat<br/>POST /heartbeat"]
    end

    subgraph Storage ["Persistence"]
        PG["PostgreSQL 17<br/>(ODM Schema)"]
        PGV["pgvector<br/>(embeddings)"]
        GIT["Git Repos<br/>(code + docs)"]
    end

    subgraph UI ["Control Center"]
        CC["Widget Grid<br/>(Next.js + shadcn)"]
    end

    subgraph Founder ["Human"]
        SH["Shai<br/>(founder)"]
    end

    %% Agent -> Bus
    A0 & A1 & A2 & A3 & A10 -->|"REST calls"| REST
    A0 & A1 & A2 & A3 & A10 -->|"heartbeat (30s)"| HB

    %% Bus -> Agent
    SSE -->|"task assignments"| A0 & A2 & A3 & A10
    SSE -->|"interventions"| A0 & A1
    SSE -->|"cost alerts"| A1

    %% Bus -> Storage
    REST -->|"read/write"| PG
    REST -->|"embeddings"| PGV

    %% Agents -> Storage
    A2 & A3 & A10 -->|"commits"| GIT

    %% Bus -> UI
    SSE -->|"real-time events"| CC

    %% UI -> Founder
    CC -->|"dashboard"| SH
    SH -->|"decisions, messages"| CC
    CC -->|"REST calls"| REST

    style Bus fill:#34495e,color:#fff
    style PG fill:#2980b9,color:#fff
    style CC fill:#8e44ad,color:#fff

7.3 Knowledge Flow

flowchart LR
    subgraph Sources ["External Sources"]
        S1["Anthropic Docs"]
        S2["GitHub"]
        S3["npm / PyPI"]
        S4["MDN / W3C"]
        S5["Hugging Face"]
    end

    subgraph RC ["Research Center"]
        SR["Source Registry<br/>(governed list)"]
        RE["Research Execution<br/>(agent task)"]
    end

    subgraph Librarian ["Librarian Process"]
        DEC["Decompose<br/>(document -> notes)"]
        TAG["Tag + Link<br/>(frontmatter, backlinks)"]
        IDX["Index<br/>(searchable catalog)"]
    end

    subgraph Storage ["Knowledge Storage"]
        GIT2["Git Vault<br/>(Obsidian markdown)"]
        PG2["PostgreSQL<br/>(knowledge_objects)"]
        VEC["pgvector<br/>(embeddings)"]
    end

    subgraph Retrieval ["Retrieval"]
        RAG["RAG Pipeline<br/>(query -> embed -> search)"]
        CTX["Context Assembly<br/>(relevant chunks)"]
    end

    subgraph Execution ["Agent Execution"]
        AGT["Agent Session"]
        ACT["Activity Output"]
    end

    subgraph Learning ["Learning Loop"]
        HS["Hindsight<br/>(what worked?)"]
        RF["Reflection<br/>(why?)"]
        UPD["Knowledge Update"]
    end

    Sources --> SR
    SR --> RE
    RE --> DEC
    DEC --> TAG --> IDX
    IDX --> GIT2
    IDX --> PG2
    PG2 --> VEC

    VEC --> RAG
    GIT2 --> RAG
    RAG --> CTX
    CTX --> AGT
    AGT --> ACT

    ACT --> HS
    HS --> RF
    RF --> UPD
    UPD --> PG2
    UPD --> GIT2

    style RC fill:#e67e22,color:#fff
    style Storage fill:#2980b9,color:#fff
    style Retrieval fill:#27ae60,color:#fff
    style Learning fill:#8e44ad,color:#fff

7.4 Cost Flow

flowchart TD
    ACT2["Activity Completes<br/>(tokens_in, tokens_out, model)"]
    -->|"calculate USD"| CLE["Cost Ledger Entry<br/>(activity_id, cost_usd,<br/>tokens_in, tokens_out)"]

    CLE -->|"aggregate"| AGG["Aggregation<br/>(task / ticket / sprint / project)"]

    AGG --> GOV{"Governor Check<br/>(Part 7 breakers)"}

    GOV -->|"under threshold"| DASH["Dashboard Widget<br/>(Cost Summary)"]
    GOV -->|"80% budget"| WARN["Warning Alert<br/>(amber indicator)"]
    GOV -->|"90% budget"| CRIT["Critical Alert<br/>(red indicator)"]
    GOV -->|"100% budget"| TRIP["Breaker Trips<br/>(pause agent)"]

    WARN --> NOTIFY["Founder Notification<br/>(SSE + Alerts Panel)"]
    CRIT --> NOTIFY
    TRIP --> NOTIFY
    TRIP --> PAUSE["Agent Paused<br/>(awaits manual reset)"]

    NOTIFY --> DASH

    subgraph Thresholds ["Budget Thresholds (Max20 = $200/mo)"]
        TH1["Per-task: configurable<br/>(default $10)"]
        TH2["Per-sprint: $50"]
        TH3["Per-month: $200"]
    end

    Thresholds -.->|"checked by"| GOV

    style CLE fill:#f39c12,color:#fff
    style TRIP fill:#e74c3c,color:#fff
    style DASH fill:#3498db,color:#fff
    style GOV fill:#2c3e50,color:#fff

8. Process Checklists

8.1 New Project Setup

  • Define project in ODM (name, description, topics, start_date, end_date)
  • Create Paperclip project (or ODM equivalent if post-migration)
  • Assign project orchestrator (agent with orchestrator role)
  • Build agent roster (roles needed, agents available, capacity check)
  • Create initial ticket set from requirements
  • Run System Review (this Part 9 process) on the ticket set
  • Review findings: all risks acknowledged, all dependencies mapped
  • Approve and begin Phase 0

8.2 Agent Commissioning

  • Determine role requirements (what skills, what model tier)
  • Check host capacity (RAM, CPU, active container count)
  • Select or spawn agent (3-digit ID from available range)
  • Assign roles and project binding in agent_runtimes table
  • Register in Control Bus (POST /agents)
  • Load activation file with required skills (skills_on_load)
  • Verify heartbeat received by Bus within 30 seconds
  • Assign first task and confirm execution

8.3 Sprint Start

  • Review previous sprint retrospective (lessons, blockers)
  • Update plan.yaml with new sprint tickets
  • Verify agent roster is adequate for sprint workload
  • Check host capacity for planned parallel work
  • Brief agents via Control Bus with sprint goals
  • Set sprint in ODM (start_date, end_date, ticket assignments)
  • Confirm all sprint dependencies from prior sprints are met

8.4 Sprint Close

  • Verify all sprint tickets are done or explicitly deferred
  • Run completion tests for all done tickets
  • Generate sprint cost report (total USD, per-agent, per-ticket)
  • Generate execution report (Part 3 format, Section 14A)
  • Write retrospective: what worked, what did not, what to change
  • Update knowledge vault with lessons learned
  • Archive sprint record, prepare next sprint in ODM

8.5 Technology Evaluation

  • Identify tool, skill, framework, or library to evaluate
  • Check Source Registry for prior evaluations of this technology
  • Create knowledge vault note using standard evaluation template
  • Research: what it does, relevance to STRUXIO, maturity, cost, risk
  • Compare against existing solutions in the stack
  • Decision: adopt / evaluate further / defer / reject
  • Update Resource Registry with decision and rationale
  • If adopted: create installation ticket and update CLI_TOOLS_ASSESSMENT

8.6 Deployment

  • Pre-deploy: run all tests (pytest + Playwright if applicable)
  • Pre-deploy: verify host capacity (RAM > 2GB free, disk > 10GB free)
  • Pre-deploy: backup current state (pg_dump + restic snapshot)
  • Deploy: apply changes (docker compose up, migration scripts, config)
  • Post-deploy: health check all services (Bus /health, UI loads, PG responds)
  • Post-deploy: verify Control Bus connectivity (SSE streams active)
  • Post-deploy: smoke test core workflows (create task, assign, complete)
  • If failure: execute rollback (restore pg_dump, revert containers)

8.7 Recovery

  • Identify failure scope: agent, service, host, or data
  • Check host health (free -h, df -h, top, docker stats)
  • Check Docker container status (docker ps -a, docker logs)
  • Restart failed containers (docker compose restart )
  • If data issue: restore from latest pg_dump (pg_restore)
  • If full host failure: restore from Restic backup (B2 daily 03:00 UTC)
  • Verify all services healthy post-recovery
  • Resume interrupted work from last checkpoint (plan.yaml, session state)
  • Create incident record in ODM with root cause and resolution

9. Meta -- This Process as a XIOPro Capability

9.1 Reusability

This System Review process is not specific to the XIOPro build. It is a reusable capability that every XIOPro-managed project should execute before implementation begins.

When XIOPro manages a product (e.g., the first product -- see MVP1_PRODUCT_SPEC.md), it will:

  1. Decompose product requirements into a blueprint (using the Librarian process)
  2. Run System Review on the product blueprint (this Part 9 process)
  3. Generate tickets from the review findings
  4. Execute tickets through the agent system (Bus, agents, governance)
  5. Close with acceptance tests and sprint retrospective

The same applies to any future project: client onboarding, compliance audits, internal tooling. The System Review is the governance gate between "we have a plan" and "we start building."

9.2 ODM Entity

The System Review itself should be tracked as an ODM entity:

project_review:
  id: uuid
  project_id: uuid          # FK to projects
  review_type: enum
    # initial         -- before first implementation
    # mid_sprint      -- checkpoint during execution
    # sprint_close    -- end-of-sprint review
    # major_change    -- triggered by scope or architecture change
  status: enum
    # pending         -- review requested
    # in_progress     -- reviewer is working
    # passed          -- review complete, no blockers
    # failed          -- critical issues found, cannot proceed
    # needs_fixes     -- issues found, fixable before proceeding
  reviewer: string           # agent ID or user ID
  findings: [string]         # list of finding summaries
  risk_count: int            # number of risks identified
  risk_high_count: int       # number of high/critical risks
  module_count: int          # modules verified
  ticket_count: int          # tickets reviewed
  dependency_depth: int      # longest dependency chain length
  created_at: datetime
  completed_at: datetime|null
  verdict: string|null       # free-text summary verdict

9.3 Skill

This process should become a registered skill in the Skill Registry (SKILL_REGISTRY.yaml):

skill:
  id: system-review
  name: "System Review"
  description: >
    Run comprehensive review of a project blueprint before implementation.
    Verifies data schema completeness, maps modules and subjects, compiles
    risk register, maps dependencies, creates data flow diagrams, and
    generates process checklists. Produces a review report with pass/fail verdict.
  triggers:
    - /system-review
    - /review-project
    - /pre-implementation-check
  roles: [orchestrator, governor]
  model_tier: sonnet          # Sonnet sufficient; Opus for ambiguous findings
  token_estimate: 15000-25000
  steps:
    1. Verify data schema completeness (all ODM entities have DDL)
    2. Build module index (group tickets by subsystem)
    3. Build subject index (cross-reference by concept)
    4. Compile risk register (identify gaps, conflicts, capacity issues)
    5. Map ticket dependencies (build directed graph)
    6. Identify critical path and parallel execution lanes
    7. Create data flow diagrams (task, communication, knowledge, cost)
    8. Generate process checklists (setup, sprint, deploy, recover)
    9. Produce review report with verdict: pass / needs-fixes / fail
  outputs:
    - Part 9 document (this file)
    - Updated risk register
    - Dependency graph (Mermaid)
    - Process checklists

9.4 Rule

Every project must pass System Review before its first implementation ticket begins execution. This is a governance gate, not a suggestion.

Rule definition:

rule:
  id: require-system-review
  name: "Mandatory System Review"
  scope: project
  trigger: "First ticket in project moves to state 'active'"
  condition: "project_review.status == 'passed' for this project"
  action_on_fail: "Block ticket activation. Notify orchestrator and founder."
  severity: critical
  exceptions: none
  rationale: >
    Starting implementation without System Review risks building on
    incomplete schemas, unresolved dependencies, or unacknowledged risks.
    The review takes hours; fixing these issues mid-build takes days.

10. Project Lifecycle Management -- Topic to Product

This section defines the end-to-end lifecycle that every XIOPro project follows, from initial idea to production product. It is the process backbone that connects all other parts of the blueprint.

10.1 Lifecycle Overview

flowchart LR
    Topic["Topic"] --> Research["Research"]
    Research --> BP["Blueprint"]
    BP --> Review["Review + Readiness"]
    Review --> Plan["Work + Test Plan"]
    Plan --> Tickets["Tickets"]
    Tickets --> Execute["Sprint Execution"]
    Execute --> IntTest["Integration Test"]
    IntTest --> Product["Product"]

    IntTest -->|"Issues found"| Tickets
    Review -->|"Not ready"| Research

Every phase has a gate (exit criteria) and a T1P standard (quality bar). No phase is skipped. Iteration loops are expected and healthy.

10.2 Phase Definitions

Phase 1: Topic to Project

Attribute Value
Trigger Idea or discussion identified as a potential project
Actions Create project in ODM (name, description, topics). Assign project orchestrator. Define initial scope and constraints.
Gate Project registered, orchestrator assigned
T1P Standard Clear objective, bounded scope, measurable success criteria
Estimated Time 1-2 hours

Phase 2: Research

Attribute Value
Trigger Project created
Actions Research Center scans relevant sources. Domain research (competitors, standards, technologies). Feasibility assessment. Multiple research threads possible (parallel).
Gate Research outputs reviewed, key decisions documented
T1P Standard Evidence-based decisions, source lineage, evaluation records
Estimated Time 2-4 hours per research thread

Phase 3: Blueprint

Attribute Value
Trigger Research complete, direction decided
Actions Create project blueprint (using XIOPro BP as template). Define architecture, data model, components. Librarian decomposes into knowledge notes.
Gate Blueprint complete, all sections covered
T1P Standard ODM entities defined, data schema written, module index complete
Estimated Time 4-8 hours

Phase 4: Review and Readiness

Attribute Value
Trigger Blueprint draft complete
Actions Internal review (GO scans for gaps, consistency). External review (send to ChatGPT, Gemini, NotebookLM). System Review process (Part 9 checklists, Sections 1-9). Build readiness evaluation.
Gate Reviews complete, all critical findings addressed
T1P Standard 3+ external reviews, risk register, dependency map, ER diagram
Estimated Time 2-4 hours (reviews run in parallel)

Build Readiness Checklist:

Check Criterion
Data schema complete DDL written and validated
All entities defined Every object has properties, lifecycle, relationships
Risk register complete 15+ risks with mitigations
Dependencies mapped Critical path identified
Test strategy defined Test layers for each output type
Review findings addressed All critical items fixed
Ticket coverage verified Every module has implementing tickets

Phase 5: Work and Test Plan

Attribute Value
Trigger Build readiness gate passed
Actions Generate tickets from blueprint (automated from Part 12 template). Estimate effort using XIOPro Time Database (not human estimates). Create sprint plan with dependency ordering. Define test plan per ticket.
Gate All tickets written with completion tests
T1P Standard Every ticket has: plan, completion test, review requirement
Estimated Time 1-2 hours

Phase 6: Sprint Execution

Attribute Value
Trigger Tickets created and prioritized
Actions Agents pick up tickets from Bus. Execute with review gates (code: test, UI: screenshot, doc: validation). Continuous Paperclip sync. Real-time progress in Control Center.
Gate All sprint tickets pass completion tests
T1P Standard Every output reviewed and tested per type
Sprint Duration Hours, not weeks. Typically 2-8 hours per sprint.
Estimated Time 4-12 hours per sprint

Phase 7: Integration Test

Attribute Value
Trigger Sprint complete
Actions End-to-end test (walking skeleton pattern). Cross-component integration verification. Performance baseline. Security scan.
Gate All integration tests pass
T1P Standard Walking skeleton proven, no regressions
Estimated Time 1-2 hours

Phase 8: Product

Attribute Value
Trigger Integration tests pass
Actions Deploy to production. Verify health endpoints. Update documentation. Sprint retrospective.
Gate Product live and monitored
T1P Standard Deployment checklist complete, monitoring active
Estimated Time 1-2 hours

10.3 Agile Principles (XIOPro-Calibrated)

Principle XIOPro Interpretation
Sprint duration Hours, not weeks
Iteration speed Multiple sprints per day possible
Feedback loops External review + internal testing + user feedback
Continuous No waterfall phases -- iterate constantly
Human calibration Build XIOPro Time Database from actual execution data

10.4 T1P Standards Discovery

XIOPro discovers and applies T1P (Top 1 Percent) standards through the Research Center, scanning industry best practices for each lifecycle phase.

Sources:

  • awesome-agentic-patterns (nibzard)
  • Software engineering best practices
  • ISO/IEC 25010 (software quality model)
  • XIOPro's own execution history

Standards by Phase:

Phase T1P Standards
Blueprints 12-part structure minimum. ODM with lifecycle states. Risk register with mitigations. Dependency map. External review by 3+ LLMs.
Build Readiness DDL must run without error. Walking skeleton acceptance scenarios defined. Every module has implementing tickets. Review findings addressed.
Work Plans Every ticket has completion test. Dependencies mapped and ordered. Agent time estimates (not human). Sprint duration in hours.
Execution Review gate per output type. Playwright screenshot for UI. API test suite for endpoints. Walking skeleton re-run after integration.

10.5 XIOPro Time Database

The XIOPro Time Database replaces human time estimates with calibrated agent execution benchmarks. It grows with every project XIOPro runs.

Schema:

table: execution_time_benchmarks
columns:
  - task_type: "api_endpoint | ui_widget | research | blueprint | document | migration | test_suite"
  - complexity: "low | medium | high"
  - model_used: "opus | sonnet | haiku"
  - estimated_human_hours: float
  - actual_agent_minutes: float
  - acceleration_ratio: float    # human_hours * 60 / agent_minutes
  - sample_count: int

Seed Data (from XIOPro Day 1 -- 13 days estimated, ~2.5 hours actual):

Task Type Complexity Model Human Est (h) Agent Actual (min) Ratio Samples
api_endpoint medium sonnet 16 5 192x 8
ui_widget medium sonnet 24 8 180x 6
blueprint_section high opus 480 20 1440x 12
research medium opus 240 10 1440x 10
migration medium sonnet 480 10 2880x 1
test_suite medium sonnet 240 8 1800x 4

This database is the basis for all XIOPro project estimation. Human estimates are recorded for comparison but never used for planning.

10.6 ODM Connection

The project lifecycle phase is tracked on the Project entity:

project:
  lifecycle_phase: enum
    # topic | research | blueprint | review | planning | execution | integration | production | maintenance

This field is distinct from the three-dimensional state model (status/state/status_state). The state model tracks workflow position; lifecycle_phase tracks which lifecycle gate the project has reached.

Transition rules: - lifecycle_phase advances when the gate criteria for the current phase are met - lifecycle_phase can regress (e.g., review -> research when readiness check fails) - Only the project orchestrator or system master can advance lifecycle_phase - Every transition is logged in the project's activity history


11. Project Template Architecture

XIOPro is a multi-template project factory. Each template shares the same engine but applies different skills, expertise, and potentially additional engines or governors.

11.1 Template Structure

Every project template consists of:

  • Core (shared by all templates): Control Bus, ODM, Governance, UI, Knowledge System, Librarian
  • Skills: Template-specific skill sets from the Skill Registry
  • Agent Roles: Template-specific role assignments
  • Additional Engine (optional): Template-specific processing (e.g., ISO engine for compliance)
  • Additional Governor (optional): Template-specific governance rules
  • Lifecycle: Same 8-phase lifecycle (Topic to Product), customized gates per template

11.2 Initial Templates

IT Project Template (Active -- being built now)

  • Purpose: Build software products
  • Skills: coding, testing, architecture, deployment, debugging, TDD
  • Agent Roles: orchestrator, engineering specialist, devops specialist
  • Engine: Standard XIOPro engine
  • Output: Software products, APIs, UIs, infrastructure

Marketing Template (Planned)

  • Purpose: Marketing campaigns and go-to-market
  • Skills: SEO, ad copy, competitor analysis, campaign planning, lead research
  • Agent Roles: orchestrator, marketing specialist, content specialist
  • Additional: Competitive Ads analysis, Lead Research skills
  • Output: Campaigns, landing pages, ad copy, market analysis

Content Creation Template (Planned)

  • Purpose: Create and manage content
  • Skills: writing, brand voice (Voice DNA), research, editing, citations
  • Agent Roles: orchestrator, content specialist, editor
  • Additional: NotebookLM for synthesis, Voice DNA for brand consistency
  • Output: Articles, documentation, presentations, podcasts, educational material

Knowledge Expert Template (Planned)

  • Purpose: Domain expertise and knowledge management
  • Skills: research, synthesis, classification, evaluation, teaching
  • Agent Roles: orchestrator, research specialist, domain expert
  • Additional: Research Center automation, Librarian deep integration
  • Output: Knowledge bases, evaluations, training materials, expert consultations
Knowledge Expert Domains (examples)
  • ISO 19650: Parts 1-6, national annexes, implementation guidance
  • BIM: IFC, openBIM, model coordination, clash detection, LOD/LOI
  • Construction Industry Players:
  • Project Initiator / Owner / Developer
  • Design Partners (architects, structural engineers, MEP engineers)
  • General Contractor
  • Subcontractors (electrical, plumbing, HVAC, concrete, steel)
  • Inspectors and quality assessors
  • Quantity surveyors
  • Project managers and BIM managers

11.3 Template Registry

template_registry:
  location: "struxio-logic/templates/"
  format: "YAML template definition + skill list + role assignments"

  template_definition:
    id: string
    name: string
    description: string
    status: active | planned | deprecated

    core_skills: [skill_id]           # from SKILL_REGISTRY
    additional_skills: [skill_id]     # template-specific

    agent_roles:
      - role: string
        skills_on_load: [skill_id]
        min_model: string

    additional_engine: string|null     # e.g., "iso19650-engine"
    additional_governor: string|null   # e.g., "compliance-governor"

    lifecycle_customizations:
      research_sources: [source_id]    # template-specific research sources
      review_criteria: [string]        # additional review gates
      test_requirements: [string]      # template-specific tests

11.4 Creating a New Template

  1. Define template in YAML (skills, roles, engines)
  2. Register in template_registry
  3. Create project using template -- auto-assigns skills and roles
  4. Project lifecycle applies with template-specific customizations

11.5 Rule

All templates share the same XIOPro engine and lifecycle. The differentiation is in skills, roles, and domain expertise -- not in the core platform. This ensures consistency across all project types.


12. Blueprint Part Numbering: Specification vs Operational

Parts 1-8 are specification. Parts 9-14 are operational. This separation is intentional. No renumbering needed.

Range Nature Parts
Parts 1-8 Specification — what the system is Foundations, Architecture, ODM, Agent System, Knowledge System, UI, Governance, Infrastructure
Parts 9-14 Operational — how the system runs Project Templates, Swarm Architecture, System Review, Work Plan, Execution Log, Ticket Register

The spec/operational boundary is a structural design decision, not an accident of growth. Renumbering would destroy the meaning of the boundary and break all existing cross-references.


Changelog

Date Change
2026-03-29 Part 9 created. System Review as verification gate: ER diagram audit, module index (25 modules across 8 epics), subject index (50+ cross-referenced entries), risk register (15-20 risks with severity and mitigation), dependency order (37 tickets mapped with critical path through TKT-1002 -> 1004 -> 1011 -> 1020 -> 1063 at 11 days), data flow diagrams (4: task lifecycle, agent communication, knowledge flow, cost flow), process checklists (7: project setup, agent commissioning, sprint start, sprint close, technology evaluation, deployment, recovery), meta-capability definition (ODM entity, skill, governance rule).
2026-03-29 Added Section 10: Project Lifecycle Management -- Topic to Product. 8-phase lifecycle with gates and T1P standards. T1P Standards Discovery process. XIOPro Time Database (agent execution benchmarks seeded from Day 1 data). ODM lifecycle_phase enum for Project entity.
2026-03-29 Added Section 11: Project Template Architecture. Multi-template project factory design. 4 initial templates (IT Project active, Marketing/Content/Knowledge Expert planned). Template Registry YAML schema. Knowledge Expert domains include ISO 19650, BIM, Construction Industry Players.
2026-03-30 N17: Added Section 12 — Blueprint Part Numbering: Specification vs Operational. Documents that Parts 1-8 are specification, Parts 9-14 are operational. Separation is intentional; no renumbering needed.