XIOPro Production Blueprint v5.0¶
Part 11 — System Review¶
1. Purpose¶
This part is the verification gate between design (Parts 1-10) and execution (Parts 12-14).
Before any project moves to implementation, it must pass through System Review to verify:
- Data schema is complete and consistent
- All system modules are identified and positioned
- Risks are identified and mitigated
- Dependencies are mapped and ordered
- Data flows are documented
- Operational checklists exist
This process is reusable -- every XIOPro project passes through it.
1.1 Scope of This Document¶
Sections 1-5 cover the first half of the System Review:
| Section | Content |
|---|---|
| 1 | Purpose and scope |
| 2 | Data Schema verification |
| 3 | System Module Index |
| 4 | Subject Index |
| 5 | Risk Register |
The second half (Sections 6-10: Dependency Map, Data Flow Audit, Operational Checklists, Gap Analysis, Review Sign-Off) will follow in a separate document.
2. Data Schema¶
The canonical schema is at: resources/SCHEMA_walking_skeleton_v4_2.sql
2.1 Entity-Relationship Overview¶
erDiagram
%% ── Core Work Graph ──
projects ||--o{ sprints : contains
projects ||--o{ tickets : contains
projects ||--o{ project_agent_bindings : has
sprints ||--o{ tickets : scopes
tickets ||--o{ tasks : decomposes_into
tickets ||--o| tickets : parent_ticket
tasks ||--o| tasks : parent_task
tasks ||--o{ activities : generates
%% ── Discussion & Ideas ──
projects ||--o{ discussion_threads : hosts
discussion_threads ||--o| tickets : linked_ticket
discussion_threads ||--o| tasks : linked_task
discussion_threads ||--o| sessions : linked_session
ideas ||--o| topics : classified_by
ideas ||--o| users : raised_by
ideas ||--o| tickets : converts_to
idea_discussion_links }o--|| ideas : links
idea_discussion_links }o--|| discussion_threads : links
%% ── Agent System ──
agent_templates ||--o{ agent_runtimes : instantiates
agent_runtimes ||--o{ sessions : runs_in
agent_runtimes ||--o{ activities : executes
agent_runtimes ||--o| hosts : deployed_on
agent_runtimes ||--o| agent_runtimes : parent_runtime
agent_runtimes ||--o| tickets : assigned_ticket
agent_runtimes ||--o| tasks : assigned_task
project_agent_bindings }o--|| projects : roster_for
%% ── Governance ──
escalation_requests ||--o| agent_runtimes : raised_by
escalation_requests ||--o| tasks : about
escalation_requests ||--o| activities : triggered_by
human_decisions }o--|| escalation_requests : resolves
human_decisions ||--o| agent_runtimes : applies_to
human_decisions ||--o| tasks : applies_to
override_records ||--o| agent_runtimes : targets
%% ── Knowledge ──
topics ||--o| topics : parent_topic
research_tasks ||--o| projects : scoped_to
research_tasks ||--o| tickets : linked_to
research_tasks ||--o| tasks : parent_task
research_tasks ||--o| agent_runtimes : owned_by
%% ── Cost & Time ──
cost_ledger }o--|| activities : charges
cost_ledger ||--o| tasks : attributed_to
cost_ledger ||--o| tickets : attributed_to
cost_ledger ||--o| projects : attributed_to
time_ledger }o--|| activities : records
time_ledger ||--o| tasks : attributed_to
2.2 Table Summary¶
| # | Table | Group | Key Relationships |
|---|---|---|---|
| 0 | users |
Core | Standalone identity entity. Referenced by ideas.raised_by_user_id. |
| 1 | topics |
Knowledge | Self-referencing tree (parent_topic_id). Referenced by ideas, tickets, tasks, agent_templates, research_tasks via UUID arrays. |
| 2 | projects |
Core Work Graph | Parent of sprints, tickets, discussion_threads, project_agent_bindings, research_tasks, cost_ledger. |
| 2A | project_agent_bindings |
Core Work Graph | Junction: projects <-> agent identity (agent_id VARCHAR(3)). |
| 3 | sprints |
Core Work Graph | FK to projects. Referenced by tickets.sprint_id. |
| 4 | discussion_threads |
Core Work Graph | FK to projects. Deferred FKs to tickets, tasks, sessions. |
| 4A | ideas |
Core Work Graph | FK to topics, users. Deferred FK to tickets. |
| 4B | idea_discussion_links |
Core Work Graph | Junction: ideas <-> discussion_threads. |
| 5 | tickets |
Core Work Graph | FK to projects, sprints. Self-referencing (parent_ticket_id). Parent of tasks. |
| 6 | tasks |
Core Work Graph | FK to tickets. Self-referencing (parent_task_id). Deferred FK to agent_runtimes. Parent of activities. |
| 7 | hosts |
Agent System | Standalone capacity entity. Referenced by agent_runtimes.host_id. |
| 8 | agent_templates |
Agent System | Canonical agent class. Parent of agent_runtimes. |
| 9 | agent_runtimes |
Agent System | FK to agent_templates, hosts, tickets, tasks. Self-referencing (parent, root, orchestrator). Deferred FK to sessions. |
| 10 | sessions |
Agent System | FK to agent_runtimes. Referenced by activities, discussion_threads, escalation_requests. |
| 11 | activities |
Agent System | FK to tasks, agent_runtimes, sessions. Parent of cost_ledger, time_ledger. |
| 12 | escalation_requests |
Governance | FK to agent_runtimes, sessions, tickets, tasks, activities. Parent of human_decisions. |
| 13 | human_decisions |
Governance | FK to escalation_requests, agent_runtimes, tasks. |
| 14 | override_records |
Governance | Polymorphic scope (scope_type + scope_ref). Append-only audit trail. |
| 15 | cost_ledger |
Cost/Time | FK to activities, agent_runtimes, tasks, tickets, projects. |
| 16 | time_ledger |
Cost/Time | FK to activities, agent_runtimes, tasks, tickets. |
| 17 | research_tasks |
Knowledge | FK to projects, tickets, tasks, agent_runtimes. |
2.3 Schema Statistics¶
| Metric | Count |
|---|---|
| Tables | 21 |
| Enum types | 28 |
| Deferred foreign keys | 6 |
| Auto-update triggers | 12 |
| Partial indexes | 2 |
Generated columns (status_state) |
10 |
2.4 Schema Conventions¶
All tables follow the ODM Metadata Contract (ODM Section 12.2):
tags TEXT[]-- free-form classificationlabels TEXT[]-- structured labelssource_system TEXT-- originating systemsource_ref TEXT-- external referencecorrelation_id TEXT-- cross-system tracingidempotency_key TEXT-- replay protectionnotes TEXT-- human-readable notescreated_by TEXT,updated_by TEXT-- audit trailcreated_at TIMESTAMPTZ,updated_at TIMESTAMPTZ-- timestamps
All lifecycle-bearing entities use the three-dimensional state model:
status(devxio_status) -- workflow phasestate(devxio_state) -- runtime conditionstatus_state-- generated composite for indexing
3. System Module Index¶
Every distinct capability, engine, or module in XIOPro, with its primary blueprint location, T1P posture, and dependencies.
| # | Module | Primary Part | Description | T1P Posture | Depends On |
|---|---|---|---|---|---|
| 1 | Control Bus | Part 2, 5.8 | Central messaging relay for all XIOPro surfaces. SSE push, intervention model, message routing between brains and services. | Full | PostgreSQL, API Service |
| 2 | Orchestrator | Part 4, 4.1 | Master execution coordinator. Decomposes tickets into tasks, assigns agents, manages execution flow and dependencies. | Full | Control Bus, Work Graph, Agent Templates |
| 3 | Governor | Part 7, 5 | Runtime governance engine. Enforces budgets, breakers, escalation policies, health monitoring, and cost controls. | Full | Control Bus, Cost Ledger, Orchestrator |
| 4 | Rule Steward | Part 4, 4.2A | Manages the lifecycle of all operational rules: creation, validation, versioning, conflict detection, and retirement. | Scaffold | Governor, Knowledge Store |
| 5 | Prompt Steward | Part 4, 4.2B | Manages context assembly, prompting modes, question budgets, and prompt package contracts for agent interactions. | Scaffold | Rule Steward, Skill Registry |
| 6 | Module Steward | Part 4, 4.2C | Evaluates, adopts, and governs external modules and tools. Manages the module portfolio lifecycle. | Scaffold | Governor, Research Center |
| 7 | Librarian | Part 5, 4 | Core knowledge management system. Ingestion, indexing, search, decomposition, and document lifecycle. | Scaffold | PostgreSQL, pgvector, Object Storage |
| 8 | Research Center | Part 5, 8 | Operational research engine. Source scouting, scheduled research tasks, digest generation, NotebookLM/Obsidian integration. | Scaffold | Librarian, Source Registry, Scheduler |
| 9 | Skill Registry | Part 5, 8.9 | Central registry of all agent skills. Defines skill metadata, versioning, model compatibility, and governance rules. | Scaffold | Rule Steward, Knowledge Store |
| 10 | Skill Performance DB | Part 5, 8.9A | Tracks token consumption, quality scores, model compatibility, and execution statistics per skill. | Scaffold | Skill Registry, Cost Ledger |
| 11 | Hindsight | Part 5, 9 | Post-execution learning engine. Analyzes completed tasks, extracts patterns, generates improvement recommendations. | Scaffold | Activities, Sessions, Knowledge Store |
| 12 | Dream Engine | Part 5, 10 | Autonomous optimization engine. Identifies improvement opportunities, proposes experiments, runs during idle time. | Idle Maintenance Only | Hindsight, Skill Performance DB, Governor |
| 13 | Idle Maintenance | Part 4, 4.9.9 | T1P subset of Dream Engine. Practical optimization tasks: skill drift detection, cost anomaly review, stale knowledge cleanup. | Phase 2 (downgraded -- no dedicated ticket) | Scheduler, Skill Registry, Cost Ledger |
| 14 | RAG Pipeline | Part 5, 7.18 | Retrieval-augmented generation pipeline. Embedding, chunking, hybrid retrieval, reranking, and context injection. | Phase 2 (downgraded -- pgvector DDL deploys but pipeline activation is Phase 2) | pgvector, PostgreSQL, Prompt Steward |
| 15 | XIOPro Optimizer | Part 1, 8A | Umbrella capability grouping Governor, Rule Steward, Prompt Steward, Module Steward, and Dream Engine as the self-improvement loop. | Scaffold | Governor, All Stewards, Dream Engine |
| 16 | Control Center UI | Part 6 | Widget-based web UI. Attention queue, brain interaction, prompt composer, governance dashboards, research desk. | First Wave | API Service, SSE Push, Control Bus |
| 17 | Prompt Composer | Part 6, 12 | UI component for structured prompt construction. Mode selection, search/research toggle, style controls, module/model controls. | First Wave | Prompt Steward, UI Framework |
| 18 | Agent Spawning | Part 4, 5A | Agent lifecycle management. Three patterns: roster agent, on-demand agent, ephemeral sub-agent. Capacity-aware host placement. | Full | Orchestrator, Host Registry, Agent Templates |
| 19 | ODM (Operational Domain Model) | Part 3 | Canonical data model. 21 tables, three-dimensional state model, metadata contract, entity lifecycle rules. | Full | PostgreSQL |
| 20 | Knowledge Ledger | Part 5, 4.7 | Change and evolution log for all knowledge objects. Tracks document lifecycle, revival, export, and drift. | Scaffold | Librarian, PostgreSQL |
| 21 | Execution Report | Part 4, 20 | Post-execution summary generation. Cost, duration, outcome, and success criteria assessment per ticket. | Scaffold | Activities, Cost Ledger, Time Ledger |
| 22 | Host Registry | Part 3, 4.1B | Fleet machine inventory. Tracks capacity (CPU, RAM, SSD, GPU), active agents, and health status per host. | Full | PostgreSQL |
| 23 | Source Registry | Part 5, 8.10.1 | Curated list of external research sources. Ranked, scheduled, with trust and freshness metadata. | Scaffold | Research Center, Librarian |
| 24 | Resource Registry | Part 5, 8.10.2 | Registry of evaluated external resources (tools, libraries, services). Lifecycle tracking from discovery to adoption or rejection. | Scaffold | Research Center, Module Steward |
| 25 | Scheduler | Part 8, 8.7 | Background job execution. Cron-like scheduling for research tasks, idle maintenance, health checks, and refresh cycles. | Phase 2 (downgraded -- existing cron covers basics, dedicated scheduler is Phase 2) | PostgreSQL, Control Bus |
4. Subject Index¶
Alphabetical index of key subjects referenced across the blueprint.
| Subject | Primary Location | Also Referenced In |
|---|---|---|
| Agent Allocation | Part 3, 4.2.1 | Part 4, 5A |
| Agent Identity (3-digit) | Part 1, 8.1 | Part 3, 4.7-4.8; Part 4, 19.1 |
| Agent Lifecycle | Part 4, 6 | Part 7, 6.2 |
| Agent Runtime | Part 3, 4.8 | Part 4, 5A; Part 8, 7.2 |
| Agent Template | Part 3, 4.7 | Part 4, 4.1; Part 8, 8.3 |
| Alerts | Part 7, 10 | Part 8, 12.6 |
| Approval | Part 7, 6.5 | Part 3, 4.11; Part 4, 11 |
| Atomic Writes | Part 1, 4.3 | Part 3, 2.5; Part 8, 3.2 |
| Authentication | Part 2, 5.14 | Part 3, 4.8 (auth_method); Part 8, 11 |
| Backup | Part 2, 5.15 | Part 8, 10; Part 4, 16 |
| Breakers (Circuit) | Part 7, 9 | Part 1, 4.5; Part 4, 10 |
| CLI Tools | Part 2, 5.12 | Part 1, 4.13; Part 4, 13 |
| Completion Self-Check | Part 4, 5.2 | Part 7, 6.4 |
| Confidence Scoring | Part 4, 4.2B | Part 5, 9; Part 7, 11 |
| Control Bus | Part 2, 5.8 | Part 1, 4.12; Part 7, 7; Part 8, 7.1 |
| Cost Awareness | Part 1, 4.6 | Part 3, 4.6.2; Part 7, 6.1; Part 8, 13 |
| Cost Ledger | Part 3, 4.6.2 | Part 4, 9; Part 7, 6.1; Part 8, 13.3 |
| Data Access Rule | Part 2, 5.8 | Part 8, 7.1 |
| Debounce | Part 7, 9.2 | Part 4, 10 |
| Decomposition (Task) | Part 4, 5 | Part 3, 4.4-4.5 |
| Decomposition (Document) | Part 5, 4.5 | Part 5, 4.1 |
| Dependencies (Task) | Part 4, 5.1 | Part 3, 4.5 |
| Discussion Thread | Part 3, 4.3A | Part 5, 4; Part 6, 10.3 |
| Dream Engine | Part 5, 10 | Part 1, 12A.2; Part 4, 4.9.9; Part 5, 11A.4 |
| Escalation | Part 3, 4.11 | Part 4, 11; Part 7, 8.3; Part 6, 10.2 |
| Execution Mode | Part 3, 4.5 | Part 4, 3 |
| Execution Report | Part 4, 20 | Part 7, 6.4 |
| Firewall | Part 8, 11.4 | Part 8, 11.10.5 |
| Governor | Part 7, 5 | Part 1, 8.4; Part 4, 4.2; Part 8, 8.4 |
| Hindsight | Part 5, 9 | Part 1, 12A.2; Part 4, 4.9.9; Part 5, 11A.3 |
| Host | Part 3, 4.1B | Part 4, 14; Part 8, 5 |
| Human Decision | Part 3, 4.12 | Part 7, 6.5; Part 4, 11 |
| Idea | Part 3, 4.3B | Part 5, 4; Part 6, 10.3 |
| Idle Maintenance | Part 4, 4.9.9 | Part 1, 12A.2; Part 5, 10; Part 5, 11A.4 |
| Intervention | Part 7, 10.4 | Part 2, 5.8; Part 6, 10.2 |
| Knowledge Compounding | Part 1, 4.7 | Part 5, 2; Part 5, 14 |
| Knowledge Ledger | Part 5, 4.7 | Part 7, 12 |
| Librarian | Part 5, 4 | Part 1, 6.1; Part 8, 8.9 |
| LiteLLM | Part 2, 5.3 | Part 3, 4.8; Part 8, 8.6 |
| Memory Principles | Part 5, 4.5A | Part 5, 9.5 |
| Metadata Contract | Part 3, 12.2 | All entity definitions |
| Module Steward | Part 4, 4.2C | Part 1, 8.4; Part 7, 12.9; Part 8, 8.12 |
| NotebookLM | Part 5, 8.7 | Part 5, 8.2A |
| Obsidian | Part 5, 8.8 | Part 5, 18.3 |
| ODM (Operational Domain Model) | Part 3 | Part 1, 7; Part 2, 4.6 |
| Optimizer (XIOPro) | Part 1, 8A | Part 4, 4.2-4.2C; Part 5, 10 |
| Orchestrator | Part 4, 4.1 | Part 1, 8.2; Part 2, 4.3; Part 8, 8.3 |
| Override Record | Part 3, 4.12A | Part 7, 12.14 |
| Paperclip | Part 1, 13 | Part 8, 15 |
| pgvector | Part 5, 7.18 | Part 5, 12; Part 8, 8.8 |
| Policy Objects | Part 7, 8 | Part 7, 6 |
| PostgreSQL | Part 2, 5.5 | Part 8, 8.8; Part 3 (all entities) |
| Priority Level | Part 3 (enum) | Part 4, 8; Part 7, 10.1 |
| Prompt Composer | Part 6, 12 | Part 4, 4.2B |
| Prompt Steward | Part 4, 4.2B | Part 1, 8.4; Part 7, 12.7 |
| RAG Pipeline | Part 5, 7.18 | Part 4, 4.2B; Part 5, 12 |
| Recovery | Part 7, 8.4 | Part 4, 15; Part 8, 3.5; Part 8, 11.10 |
| Replaceability | Part 1, 4.8 | Part 8, 3.3 |
| Research Center | Part 5, 8 | Part 1, 12A.2; Part 5, 11A.2 |
| Research Task | Part 3, 4.12B | Part 5, 8.12-8.15 |
| Review Gates | Part 7, 12.16 | Part 4, 5.2 |
| Roles (Agent) | Part 1, 8.2 | Part 3, 4.7; Part 4, 4.1-4.2C |
| Ruflo | Part 2, 5.8 | Part 4, 4.2F; Part 8, 8.5 |
| Rule Steward | Part 4, 4.2A | Part 1, 8.4; Part 7, 12 |
| Scheduled Research | Part 5, 8.12 | Part 5, 11; Part 8, 8.7 |
| Secrets Management | Part 8, 11.5 | Part 2, 5.14 |
| Self-Evaluation | Part 4, 5.2 | Part 5, 9 |
| Session | Part 3, 4.10 | Part 4, 7; Part 8, 7.2 |
| Skill Performance | Part 5, 8.9A | Part 4, 4.11; Part 5, 10 |
| Skill Registry | Part 5, 8.9 | Part 4, 4.10-4.11 |
| Skill Selection | Part 4, 4.11 | Part 5, 8.9 |
| Source Registry | Part 5, 8.10.1 | Part 5, 8.11 |
| Sprint | Part 3, 4.3 | Part 4, 5 |
| SSE Push | Part 2, 5.6 | Part 6, 6.5; Part 8, 7.1 |
| Sub-Agent | Part 4, 5A.2 | Part 4, 12 |
| T1P Posture | Part 1, 12A | All Parts (posture tables) |
| Tailscale | Part 8, 5.1 | Part 8, 11.4 |
| Three-Dimensional State | Part 3, 2.5 | Part 3 (all lifecycle entities) |
| Ticket | Part 3, 4.4 | Part 4, 5; Part 7, 6 |
| Ticket Numbering | Part 3, 2.7 | Part 4, 5 |
| Time Ledger | Part 3, 4.6.3 | Part 4, 9; Part 8, 13.3 |
| Token Budget | Part 4, 4.2B | Part 7, 6.1; Part 5, 7.18 |
| Topic | Part 3, 4.1 | Part 5, 4; Part 5, 8 |
| Topic Enrichment | Part 3, 4.1.1 | Part 5, 4 |
| User | Part 3, 4.0 | Part 6, 9; Part 8, 11.3 |
| Walking Skeleton | Part 3 | Part 10 |
| Widget | Part 6, 6 | Part 6, 10-11 |
5. Risk Register¶
Risks identified across Parts 1-8, compiled with severity assessment and mitigation strategy.
5.1 Severity Scale¶
| Level | Meaning |
|---|---|
| Critical | System-wide failure or data loss. Requires immediate response. |
| High | Major capability degraded. Requires response within hours. |
| Medium | Partial degradation. Requires response within 1 business day. |
| Low | Minor inconvenience. Addressed in normal maintenance cycle. |
5.2 Risk Table¶
| # | Risk | Severity | Likelihood | Impact | Mitigation | BP Reference |
|---|---|---|---|---|---|---|
| R01 | RAM exhaustion on Hetzner CPX62 -- 30 GB shared across PostgreSQL, API, Orchestrator, LiteLLM, and all agent runtimes. A spike in concurrent agents or a memory leak crashes the control plane. | Critical | Medium | Full system outage | Memory pressure survival rule (Part 8, 11.10.3). Reserved RAM budgets per service. Governor enforces max concurrent agents via host capacity tracking. Core-first recovery order defined. | Part 8, 5.1; Part 8, 11.10.3 |
| R02 | Scope creep beyond T1P -- Premature implementation of full Dream Engine, full Steward roles, or advanced UI features before Walking Skeleton is stable. | High | High | Wasted budget, unstable foundation | T1P Posture classification (Part 1, 12A). Each capability has explicit posture: Full, Scaffold, Defer. Posture violation requires explicit approval. | Part 1, 12A |
| R03 | Single orchestrator bottleneck -- One master orchestrator (O00) manages all execution flow. If it crashes or becomes overloaded, all work halts. | High | Medium | Complete execution stoppage | 3-failure circuit breaker halts and interrupts C0 (CLAUDE.md). Session durability allows restart. Recovery policy (Part 7, 8.4) defines restart sequence. Future: multi-orchestrator with leader election. | Part 4, 4.1; Part 7, 8.4 |
| R04 | API rate limits from LLM providers -- Anthropic, OpenAI, or other providers throttle or reject requests during peak load or quota exhaustion. | High | Medium | Agent execution stalls | LiteLLM router with fallback model routing (Part 8, 8.6). Governor monitors cost ledger and enforces budget policies (Part 7, 8.1). Token budget management by Prompt Steward. | Part 2, 5.3; Part 8, 8.6 |
| R05 | Session crash with context loss -- An agent runtime crashes mid-task and the session context (conversation history, intermediate results) is lost. | High | Medium | Rework, duplicated cost | Durable session model with checkpoint_ref and transcript_ref (Part 3, 4.10). Atomic writes to PostgreSQL. Recovery policy restores from last checkpoint. | Part 3, 4.10; Part 4, 15 |
| R06 | Knowledge drift -- Knowledge base becomes stale as external sources change, internal documents are not refreshed, and embeddings decay in relevance. | Medium | High | Degraded RAG quality, incorrect agent behavior | Scheduled research refresh cycles (Part 5, 11). Knowledge Ledger tracks document lifecycle (Part 5, 4.7). Anti-entropy rules (Part 5, 15). Idle Maintenance detects stale knowledge. | Part 5, 11; Part 5, 15 |
| R07 | Cost overrun exceeding Max20 budget -- Uncontrolled LLM usage, excessive agent spawning, or inefficient prompting pushes monthly costs beyond the $200/month ceiling. | Critical | Medium | Budget breach, forced shutdown | Governor cost governance (Part 7, 6.1). Budget policy with hard caps (Part 7, 8.1). Cost ledger attribution to activity level (Part 3, 4.6.2). Cost optimization layer (Part 4, 9). Cost reporting on every deliverable (CLAUDE.md). | Part 7, 6.1; Part 8, 13 |
| R08 | Skill degradation over time -- Skills that worked well initially degrade as models are updated, contexts change, or upstream dependencies shift. | Medium | Medium | Reduced execution quality | Skill Performance DB tracks quality per skill over time (Part 5, 8.9A). Idle Maintenance detects skill drift (Part 4, 4.9.9). Dream Engine proposes improvements. | Part 5, 8.9A; Part 4, 4.9.9 |
| R09 | Security breach via exposed secrets -- API keys, OAuth tokens, or database credentials leaked through logs, commits, or misconfigured services. | Critical | Low | Full system compromise | SOPS for secrets at rest (Part 2, 5.14). No secrets in commits (CLAUDE.md). Tailscale VPN for network isolation (Part 8, 11.4). Security logging and audit (Part 8, 11.8). | Part 2, 5.14; Part 8, 11.5 |
| R10 | Data loss from PostgreSQL failure -- Database corruption, disk failure, or accidental deletion destroys the canonical state store. | Critical | Low | Total state loss | Restic backup to Backblaze B2 daily at 03:00 UTC. WAL archiving. Restore drill requirements (Part 8, 10.8). Backup verification on schedule. | Part 8, 10; Part 2, 5.15 |
| R11 | Agent behavioral drift -- Agents gradually deviate from intended behavior due to prompt template changes, context pollution, or model updates without testing. | Medium | Medium | Unpredictable execution, governance violations | Rule Steward validates rule changes (Part 4, 4.2A). Review gates for non-code outputs (Part 7, 12.16). Prompt Steward manages prompt package contracts (Part 4, 4.2B). Version check for agent runtime currency (Part 1, 4.11). | Part 4, 4.2A; Part 7, 12.16 |
| R12 | Dependency deadlock -- Circular or unresolvable task dependencies prevent execution progress. | Medium | Low | Execution stall on affected ticket | Task dependency resolution (Part 4, 5.1). DAG validation at decomposition time. Orchestrator detects cycles before scheduling. Governor breaker triggers on stall detection. | Part 4, 5.1 |
| R13 | Hetzner outage or network partition -- Cloud provider outage or Tailscale VPN disruption disconnects the control plane from local operator node or external services. | High | Low | Partial or full system unavailability | Emergency access layers (Part 8, 11.10.2). Out-of-band recovery via direct SSH. Mac Studio (Node B) can operate independently for local tasks. Health model detects degradation (Part 8, 12.5). | Part 8, 11.10; Part 8, 5 |
| R14 | Max20 throttling under growth -- As XIOPro manages more projects, the fixed infrastructure budget prevents scaling compute to match workload. | Medium | Medium | Slower execution, queuing delays | Scale-up triggers defined (Part 8, 13.5). Hetzner upgrade policy (Part 8, 13.6). Self-hosted model decision rule (Part 8, 13.7). Cost optimization prioritizes high-value work first. | Part 8, 13.5-13.7 |
| R15 | Context window limits -- Large tasks, deep conversation histories, or excessive RAG injection exceed the model's context window, causing truncation or degraded output. | Medium | High | Reduced output quality, missed context | Prompt Steward manages total context budget (Part 4, 4.2B). RAG pipeline respects context window ceiling (Part 5, 7.18). Document decomposition protocol (Part 5, 4.5). Session checkpointing allows context rotation. | Part 4, 4.2B; Part 5, 7.18 |
| R16 | Orphaned agent runtimes -- Agent processes that lose their parent orchestrator connection continue running, consuming resources without producing useful work. | Medium | Medium | RAM waste, potential interference | Heartbeat monitoring (agent_runtimes.last_heartbeat_at). Governor health governance (Part 7, 6.2). Stale heartbeat triggers cleanup. Max20 budget pressure naturally limits orphan lifetime. | Part 7, 6.2; Part 3, 4.8 |
| R17 | Escalation queue overflow -- Too many escalation requests accumulate without human response, blocking agent execution across multiple tasks. | Medium | Medium | Execution throughput collapse | Attention queue in UI (Part 6, 10.1). Escalation urgency levels with routing rules (Part 7, 8.3). Timeout policies auto-resolve low-priority escalations. Governor monitors queue depth. | Part 7, 8.3; Part 6, 10.1 |
| R18 | Schema migration failure -- Alembic migration fails mid-apply, leaving the database in an inconsistent state between schema versions. | High | Low | Service startup failure, data corruption | Alembic revision chain (schema header). Pre-migration backup. Atomic transaction per migration. Rollback script for each migration. Restore drill validates migration reversibility. | Part 8, 10; Part 2, 5.5 |
| R19 | Provider lock-in despite independence goal -- Gradual accumulation of Anthropic-specific features or prompt patterns makes switching to other providers costly. | Medium | Medium | Reduced negotiating power, migration cost | Provider independence constraint (Part 1, 4.1). LiteLLM abstraction layer (Part 8, 8.6). Skill Performance DB tracks per-model compatibility (Part 5, 8.9A). All prompts stored as portable text. | Part 1, 4.1; Part 8, 8.6 |
| R20 | Insufficient observability during early operation -- Without proper logging, metrics, and dashboards, problems are detected too late and root cause analysis is difficult. | Medium | Medium | Slow incident response, repeated failures | Observability stack requirement (Part 8, 12). Required signals defined (Part 8, 12.2). Health model (Part 8, 12.5). Alerting baseline with critical/warning/info tiers (Part 8, 12.6). Dashboard requirements (Part 8, 12.7). | Part 8, 12 |
5.3 Risk Heat Map¶
Low Medium High
Likelihood Likelihood Likelihood
+-----------+-----------+-----------+
Critical | R09 R10 | R01 R07 | |
+-----------+-----------+-----------+
High | R13 R18 | R03 R04 | R02 |
| | R05 | |
+-----------+-----------+-----------+
Medium | R12 | R08 R11 | R06 R15 |
| | R14 R16 | |
| | R17 R19 | |
| | R20 | |
+-----------+-----------+-----------+
5.4 Top 5 Risks Requiring Immediate Attention¶
- R07 -- Cost overrun: The Max20 budget is a hard constraint. Governor cost governance and per-activity attribution must be operational from day one.
- R01 -- RAM exhaustion: With 30 GB serving the entire stack, memory budgets per service must be defined and enforced before first deployment.
- R02 -- Scope creep: T1P posture classification exists but requires discipline. Every implementation decision must reference the posture table.
- R03 -- Single orchestrator: No redundancy for the master orchestrator. Session durability and recovery policy are the primary mitigations until multi-orchestrator is feasible.
- R15 -- Context window limits: High likelihood in daily operation. Prompt Steward context budget management and RAG chunking strategy must be validated early.
Changelog¶
| Version | Date | Author | Changes |
|---|---|---|---|
| 4.2.0 | 2026-03-29 | BM | Initial draft. Sections 1-5: Purpose, Data Schema, System Module Index, Subject Index, Risk Register. |
6. Dependency Order¶
6.1 Ticket Dependency Graph¶
flowchart TD
subgraph EPIC-CB ["EPIC-CB: Control Bus"]
T1001["TKT-1001<br/>SSE Push Channels"]
T1002["TKT-1002<br/>Agent Registration"]
T1003["TKT-1003<br/>Intervention Endpoints"]
T1004["TKT-1004<br/>Task Orchestration"]
T1005["TKT-1005<br/>Host Capacity"]
T1006["TKT-1006<br/>Agent Spawn"]
T1007["TKT-1007<br/>Cost Tracking"]
T1008["TKT-1008<br/>Governance Events"]
end
subgraph EPIC-ODM ["EPIC-ODM: Schema + Skeleton"]
T1010["TKT-1010<br/>Deploy DDL"]
T1011["TKT-1011<br/>Walking Skeleton"]
T1012["TKT-1012<br/>Seed Data"]
end
subgraph EPIC-GOV ["EPIC-GOV: Governance"]
T1020["TKT-1020<br/>Escalation Path"]
T1021["TKT-1021<br/>Approval Workflow"]
T1022["TKT-1022<br/>Alerts + Breakers"]
T1023["TKT-1023<br/>Override Records"]
end
subgraph EPIC-UI ["EPIC-UI: Control Center"]
T1030["TKT-1030<br/>UI Shell"]
T1031["TKT-1031<br/>Agent Status Grid"]
T1032["TKT-1032<br/>Task Board"]
T1033["TKT-1033<br/>Alerts Panel"]
T1034["TKT-1034<br/>Cost Summary"]
T1035["TKT-1035<br/>Prompt Composer"]
T1036["TKT-1036<br/>Activity Feed"]
end
subgraph EPIC-KNO ["EPIC-KNO: Knowledge System"]
T1040["TKT-1040<br/>Skill Registry"]
T1041["TKT-1041<br/>Activation Slimming"]
T1042["TKT-1042<br/>Librarian Decomposition"]
T1043["TKT-1043<br/>Source Registry"]
end
subgraph EPIC-INFRA ["EPIC-INFRA: Infrastructure"]
T1050["TKT-1050<br/>Stop Unused Services"]
T1051["TKT-1051<br/>Install Remaining CLI"]
T1052["TKT-1052<br/>Paperclip Migration"]
T1053["TKT-1053<br/>Dashboard Transition"]
end
subgraph EPIC-TEST ["EPIC-TEST: Testing"]
T1060["TKT-1060<br/>pytest Setup"]
T1061["TKT-1061<br/>Playwright Setup"]
T1062["TKT-1062<br/>Behavioral Tests"]
T1063["TKT-1063<br/>Acceptance Tests (4)"]
end
subgraph EPIC-MVP1 ["EPIC-MVP1: MVP1 Prep (see MVP1_PRODUCT_SPEC.md)"]
T1070["TKT-1070<br/>Product Engine Integration"]
T1071["TKT-1071<br/>Billing Webhooks"]
T1072["TKT-1072<br/>Landing Page Reqs"]
end
%% ODM dependencies
T1010 --> T1011
T1010 --> T1012
T1010 --> T1043
T1010 --> T1060
%% Walking skeleton dependencies
T1004 --> T1011
T1012 --> T1011
%% CB internal dependencies
T1001 --> T1003
T1001 --> T1008
T1002 --> T1004
T1002 --> T1005
T1004 --> T1005
T1005 --> T1006
T1004 --> T1007
%% Governance dependencies
T1004 --> T1020
T1010 --> T1020
T1020 --> T1021
T1008 --> T1021
T1008 --> T1022
T1020 --> T1023
T1021 --> T1023
%% UI dependencies
T1030 --> T1031
T1030 --> T1032
T1030 --> T1033
T1030 --> T1034
T1030 --> T1035
T1030 --> T1036
T1002 --> T1031
T1004 --> T1032
T1008 --> T1033
T1007 --> T1034
T1001 --> T1035
T1004 --> T1036
%% Knowledge dependencies
T1040 --> T1041
%% Infrastructure dependencies
T1011 --> T1052
T1030 --> T1053
%% Test dependencies
T1030 --> T1061
T1011 --> T1062
T1060 --> T1062
T1011 --> T1063
T1020 --> T1063
T1060 --> T1063
%% MVP1 dependencies
T1011 --> T1070
T1051 --> T1071
%% Styling: critical path in bold
style T1010 fill:#e74c3c,color:#fff,stroke:#c0392b
style T1004 fill:#e74c3c,color:#fff,stroke:#c0392b
style T1011 fill:#e74c3c,color:#fff,stroke:#c0392b
style T1020 fill:#e74c3c,color:#fff,stroke:#c0392b
style T1063 fill:#e74c3c,color:#fff,stroke:#c0392b
style T1060 fill:#e74c3c,color:#fff,stroke:#c0392b
6.2 Critical Path¶
The critical path is the longest chain of dependent tickets that determines the minimum build time. Two paths tie for longest:
Path A -- Schema to Acceptance (longest)
TKT-1010 (DDL, 0.5d)
-> TKT-1012 (Seed, 0.5d)
-> TKT-1011 (Skeleton, 3d)
-> TKT-1063 (Acceptance Tests, 2d)
= 6.0 days minimum
But TKT-1011 also depends on TKT-1004, which depends on TKT-1002. Factoring in the CB chain:
Path B -- Bus to Acceptance (true critical path)
TKT-1002 (Agent Registration, ~2d)
-> TKT-1004 (Task Orchestration, ~2d)
-> TKT-1011 (Walking Skeleton, 3d)
-> TKT-1020 (Escalation, 2d)
-> TKT-1063 (Acceptance Tests, 2d)
= 11.0 days minimum
Path C -- Bus to Governance
TKT-1002 (Registration)
-> TKT-1004 (Tasks)
-> TKT-1020 (Escalation)
-> TKT-1021 (Approval)
-> TKT-1023 (Overrides)
= ~8.0 days
The true critical path runs through Path B: from agent registration through task orchestration, the walking skeleton, escalation, and finally the acceptance tests. This chain spans all 5 phases and cannot be shortened without reducing scope.
6.3 Parallel Execution Opportunities¶
The following groups of tickets have no mutual dependencies and can execute simultaneously:
Phase 1 parallel lanes (Days 2-5):
| Lane | Tickets | Assignee |
|---|---|---|
| Lane 1: Bus Core | TKT-1001, TKT-1002, TKT-1003, TKT-1004 | Engineering Brain |
| Lane 2: Schema | TKT-1010, TKT-1012 | Engineering Brain |
| Lane 3: Infrastructure | TKT-1050 | DevOps / BrainMaster |
| Lane 4: Knowledge | TKT-1040 | BrainMaster |
Note: Lanes 1 and 2 share the Engineering Brain assignee, so true parallelism requires either two engineering agents or interleaving.
Phase 2 parallel lanes (Days 4-7):
| Lane | Tickets | Assignee |
|---|---|---|
| Lane 1: Governance | TKT-1020, TKT-1021, TKT-1022, TKT-1023 | Engineering Brain |
| Lane 2: Bus Extended | TKT-1005, TKT-1006, TKT-1007, TKT-1008 | Engineering Brain |
| Lane 3: Knowledge | TKT-1041, TKT-1043 | BrainMaster |
| Lane 4: Tools | TKT-1051 | DevOps |
Phase 3 parallel lanes (Days 6-10):
| Lane | Tickets | Assignee |
|---|---|---|
| Lane 1: UI Shell + Widgets | TKT-1030 then TKT-1031-1036 (all 6 widgets parallel after shell) | Brand Brain |
| Lane 2: E2E Setup | TKT-1061 (after TKT-1030) | Engineering Brain |
Phase 4-5 parallel lanes (Days 8-14):
| Lane | Tickets | Assignee |
|---|---|---|
| Lane 1: Migration | TKT-1052, TKT-1053 | Engineering / BM |
| Lane 2: Knowledge | TKT-1042 | Mac Worker |
Lane 3: MVP1 (see MVP1_PRODUCT_SPEC.md) |
TKT-1070, TKT-1071, TKT-1072 | Engineering / Brand |
| Lane 4: Testing | TKT-1062, TKT-1063 | Engineering Brain |
Maximum parallelism: With 3 agents working simultaneously (Engineering, Brand, BrainMaster), theoretical build time compresses from ~40 ticket-days to approximately 14 calendar days.
7. Data Flow Diagrams¶
7.1 Task Lifecycle¶
flowchart LR
A["Idea<br/>(conversation)"] --> B["Discussion Thread<br/>(type: intake)"]
B --> C["Ticket<br/>(state: open)"]
C --> D["Task<br/>(state: queued)"]
D --> E["Agent Assignment<br/>(task.assigned_to)"]
E --> F["Session<br/>(agent execution context)"]
F --> G["Activity<br/>(work unit)"]
G --> H["Result<br/>(activity_evaluations)"]
H --> I{"Success?"}
I -->|yes| J["Knowledge Object<br/>(if applicable)"]
I -->|no| K["Retry / Escalate"]
K -->|retry| D
K -->|escalate| L["Escalation<br/>(human decision)"]
L --> D
J --> M["Reflection<br/>(hindsight evaluation)"]
M --> N["Knowledge Update<br/>(vault + pgvector)"]
style A fill:#3498db,color:#fff
style C fill:#2ecc71,color:#fff
style G fill:#f39c12,color:#fff
style J fill:#9b59b6,color:#fff
style N fill:#1abc9c,color:#fff
7.2 Agent Communication¶
flowchart TB
subgraph Agents ["Agent Layer"]
A0["000<br/>Orchestrator"]
A1["001<br/>Governor"]
A2["002<br/>Engineering"]
A3["003<br/>Brand"]
A10["010<br/>Mac Worker"]
end
subgraph Bus ["Control Bus (REST + SSE)"]
direction TB
REST["REST API<br/>POST /tasks<br/>POST /messages<br/>POST /escalations<br/>GET /agents"]
SSE["SSE Push<br/>/events/agent/{id}<br/>/events/founder<br/>/events/ui"]
HB["Heartbeat<br/>POST /heartbeat"]
end
subgraph Storage ["Persistence"]
PG["PostgreSQL 17<br/>(ODM Schema)"]
PGV["pgvector<br/>(embeddings)"]
GIT["Git Repos<br/>(code + docs)"]
end
subgraph UI ["Control Center"]
CC["Widget Grid<br/>(Next.js + shadcn)"]
end
subgraph Founder ["Human"]
SH["Shai<br/>(founder)"]
end
%% Agent -> Bus
A0 & A1 & A2 & A3 & A10 -->|"REST calls"| REST
A0 & A1 & A2 & A3 & A10 -->|"heartbeat (30s)"| HB
%% Bus -> Agent
SSE -->|"task assignments"| A0 & A2 & A3 & A10
SSE -->|"interventions"| A0 & A1
SSE -->|"cost alerts"| A1
%% Bus -> Storage
REST -->|"read/write"| PG
REST -->|"embeddings"| PGV
%% Agents -> Storage
A2 & A3 & A10 -->|"commits"| GIT
%% Bus -> UI
SSE -->|"real-time events"| CC
%% UI -> Founder
CC -->|"dashboard"| SH
SH -->|"decisions, messages"| CC
CC -->|"REST calls"| REST
style Bus fill:#34495e,color:#fff
style PG fill:#2980b9,color:#fff
style CC fill:#8e44ad,color:#fff
7.3 Knowledge Flow¶
flowchart LR
subgraph Sources ["External Sources"]
S1["Anthropic Docs"]
S2["GitHub"]
S3["npm / PyPI"]
S4["MDN / W3C"]
S5["Hugging Face"]
end
subgraph RC ["Research Center"]
SR["Source Registry<br/>(governed list)"]
RE["Research Execution<br/>(agent task)"]
end
subgraph Librarian ["Librarian Process"]
DEC["Decompose<br/>(document -> notes)"]
TAG["Tag + Link<br/>(frontmatter, backlinks)"]
IDX["Index<br/>(searchable catalog)"]
end
subgraph Storage ["Knowledge Storage"]
GIT2["Git Vault<br/>(Obsidian markdown)"]
PG2["PostgreSQL<br/>(knowledge_objects)"]
VEC["pgvector<br/>(embeddings)"]
end
subgraph Retrieval ["Retrieval"]
RAG["RAG Pipeline<br/>(query -> embed -> search)"]
CTX["Context Assembly<br/>(relevant chunks)"]
end
subgraph Execution ["Agent Execution"]
AGT["Agent Session"]
ACT["Activity Output"]
end
subgraph Learning ["Learning Loop"]
HS["Hindsight<br/>(what worked?)"]
RF["Reflection<br/>(why?)"]
UPD["Knowledge Update"]
end
Sources --> SR
SR --> RE
RE --> DEC
DEC --> TAG --> IDX
IDX --> GIT2
IDX --> PG2
PG2 --> VEC
VEC --> RAG
GIT2 --> RAG
RAG --> CTX
CTX --> AGT
AGT --> ACT
ACT --> HS
HS --> RF
RF --> UPD
UPD --> PG2
UPD --> GIT2
style RC fill:#e67e22,color:#fff
style Storage fill:#2980b9,color:#fff
style Retrieval fill:#27ae60,color:#fff
style Learning fill:#8e44ad,color:#fff
7.4 Cost Flow¶
flowchart TD
ACT2["Activity Completes<br/>(tokens_in, tokens_out, model)"]
-->|"calculate USD"| CLE["Cost Ledger Entry<br/>(activity_id, cost_usd,<br/>tokens_in, tokens_out)"]
CLE -->|"aggregate"| AGG["Aggregation<br/>(task / ticket / sprint / project)"]
AGG --> GOV{"Governor Check<br/>(Part 7 breakers)"}
GOV -->|"under threshold"| DASH["Dashboard Widget<br/>(Cost Summary)"]
GOV -->|"80% budget"| WARN["Warning Alert<br/>(amber indicator)"]
GOV -->|"90% budget"| CRIT["Critical Alert<br/>(red indicator)"]
GOV -->|"100% budget"| TRIP["Breaker Trips<br/>(pause agent)"]
WARN --> NOTIFY["Founder Notification<br/>(SSE + Alerts Panel)"]
CRIT --> NOTIFY
TRIP --> NOTIFY
TRIP --> PAUSE["Agent Paused<br/>(awaits manual reset)"]
NOTIFY --> DASH
subgraph Thresholds ["Budget Thresholds (Max20 = $200/mo)"]
TH1["Per-task: configurable<br/>(default $10)"]
TH2["Per-sprint: $50"]
TH3["Per-month: $200"]
end
Thresholds -.->|"checked by"| GOV
style CLE fill:#f39c12,color:#fff
style TRIP fill:#e74c3c,color:#fff
style DASH fill:#3498db,color:#fff
style GOV fill:#2c3e50,color:#fff
8. Process Checklists¶
8.1 New Project Setup¶
- Define project in ODM (name, description, topics, start_date, end_date)
- Create Paperclip project (or ODM equivalent if post-migration)
- Assign project orchestrator (agent with orchestrator role)
- Build agent roster (roles needed, agents available, capacity check)
- Create initial ticket set from requirements
- Run System Review (this Part 9 process) on the ticket set
- Review findings: all risks acknowledged, all dependencies mapped
- Approve and begin Phase 0
8.2 Agent Commissioning¶
- Determine role requirements (what skills, what model tier)
- Check host capacity (RAM, CPU, active container count)
- Select or spawn agent (3-digit ID from available range)
- Assign roles and project binding in agent_runtimes table
- Register in Control Bus (POST /agents)
- Load activation file with required skills (skills_on_load)
- Verify heartbeat received by Bus within 30 seconds
- Assign first task and confirm execution
8.3 Sprint Start¶
- Review previous sprint retrospective (lessons, blockers)
- Update plan.yaml with new sprint tickets
- Verify agent roster is adequate for sprint workload
- Check host capacity for planned parallel work
- Brief agents via Control Bus with sprint goals
- Set sprint in ODM (start_date, end_date, ticket assignments)
- Confirm all sprint dependencies from prior sprints are met
8.4 Sprint Close¶
- Verify all sprint tickets are done or explicitly deferred
- Run completion tests for all done tickets
- Generate sprint cost report (total USD, per-agent, per-ticket)
- Generate execution report (Part 3 format, Section 14A)
- Write retrospective: what worked, what did not, what to change
- Update knowledge vault with lessons learned
- Archive sprint record, prepare next sprint in ODM
8.5 Technology Evaluation¶
- Identify tool, skill, framework, or library to evaluate
- Check Source Registry for prior evaluations of this technology
- Create knowledge vault note using standard evaluation template
- Research: what it does, relevance to STRUXIO, maturity, cost, risk
- Compare against existing solutions in the stack
- Decision: adopt / evaluate further / defer / reject
- Update Resource Registry with decision and rationale
- If adopted: create installation ticket and update CLI_TOOLS_ASSESSMENT
8.6 Deployment¶
- Pre-deploy: run all tests (pytest + Playwright if applicable)
- Pre-deploy: verify host capacity (RAM > 2GB free, disk > 10GB free)
- Pre-deploy: backup current state (pg_dump + restic snapshot)
- Deploy: apply changes (docker compose up, migration scripts, config)
- Post-deploy: health check all services (Bus /health, UI loads, PG responds)
- Post-deploy: verify Control Bus connectivity (SSE streams active)
- Post-deploy: smoke test core workflows (create task, assign, complete)
- If failure: execute rollback (restore pg_dump, revert containers)
8.7 Recovery¶
- Identify failure scope: agent, service, host, or data
- Check host health (free -h, df -h, top, docker stats)
- Check Docker container status (docker ps -a, docker logs)
- Restart failed containers (docker compose restart
) - If data issue: restore from latest pg_dump (pg_restore)
- If full host failure: restore from Restic backup (B2 daily 03:00 UTC)
- Verify all services healthy post-recovery
- Resume interrupted work from last checkpoint (plan.yaml, session state)
- Create incident record in ODM with root cause and resolution
9. Meta -- This Process as a XIOPro Capability¶
9.1 Reusability¶
This System Review process is not specific to the XIOPro build. It is a reusable capability that every XIOPro-managed project should execute before implementation begins.
When XIOPro manages a product (e.g., the first product -- see MVP1_PRODUCT_SPEC.md), it will:
- Decompose product requirements into a blueprint (using the Librarian process)
- Run System Review on the product blueprint (this Part 9 process)
- Generate tickets from the review findings
- Execute tickets through the agent system (Bus, agents, governance)
- Close with acceptance tests and sprint retrospective
The same applies to any future project: client onboarding, compliance audits, internal tooling. The System Review is the governance gate between "we have a plan" and "we start building."
9.2 ODM Entity¶
The System Review itself should be tracked as an ODM entity:
project_review:
id: uuid
project_id: uuid # FK to projects
review_type: enum
# initial -- before first implementation
# mid_sprint -- checkpoint during execution
# sprint_close -- end-of-sprint review
# major_change -- triggered by scope or architecture change
status: enum
# pending -- review requested
# in_progress -- reviewer is working
# passed -- review complete, no blockers
# failed -- critical issues found, cannot proceed
# needs_fixes -- issues found, fixable before proceeding
reviewer: string # agent ID or user ID
findings: [string] # list of finding summaries
risk_count: int # number of risks identified
risk_high_count: int # number of high/critical risks
module_count: int # modules verified
ticket_count: int # tickets reviewed
dependency_depth: int # longest dependency chain length
created_at: datetime
completed_at: datetime|null
verdict: string|null # free-text summary verdict
9.3 Skill¶
This process should become a registered skill in the Skill Registry (SKILL_REGISTRY.yaml):
skill:
id: system-review
name: "System Review"
description: >
Run comprehensive review of a project blueprint before implementation.
Verifies data schema completeness, maps modules and subjects, compiles
risk register, maps dependencies, creates data flow diagrams, and
generates process checklists. Produces a review report with pass/fail verdict.
triggers:
- /system-review
- /review-project
- /pre-implementation-check
roles: [orchestrator, governor]
model_tier: sonnet # Sonnet sufficient; Opus for ambiguous findings
token_estimate: 15000-25000
steps:
1. Verify data schema completeness (all ODM entities have DDL)
2. Build module index (group tickets by subsystem)
3. Build subject index (cross-reference by concept)
4. Compile risk register (identify gaps, conflicts, capacity issues)
5. Map ticket dependencies (build directed graph)
6. Identify critical path and parallel execution lanes
7. Create data flow diagrams (task, communication, knowledge, cost)
8. Generate process checklists (setup, sprint, deploy, recover)
9. Produce review report with verdict: pass / needs-fixes / fail
outputs:
- Part 9 document (this file)
- Updated risk register
- Dependency graph (Mermaid)
- Process checklists
9.4 Rule¶
Every project must pass System Review before its first implementation ticket begins execution. This is a governance gate, not a suggestion.
Rule definition:
rule:
id: require-system-review
name: "Mandatory System Review"
scope: project
trigger: "First ticket in project moves to state 'active'"
condition: "project_review.status == 'passed' for this project"
action_on_fail: "Block ticket activation. Notify orchestrator and founder."
severity: critical
exceptions: none
rationale: >
Starting implementation without System Review risks building on
incomplete schemas, unresolved dependencies, or unacknowledged risks.
The review takes hours; fixing these issues mid-build takes days.
10. Project Lifecycle Management -- Topic to Product¶
This section defines the end-to-end lifecycle that every XIOPro project follows, from initial idea to production product. It is the process backbone that connects all other parts of the blueprint.
10.1 Lifecycle Overview¶
flowchart LR
Topic["Topic"] --> Research["Research"]
Research --> BP["Blueprint"]
BP --> Review["Review + Readiness"]
Review --> Plan["Work + Test Plan"]
Plan --> Tickets["Tickets"]
Tickets --> Execute["Sprint Execution"]
Execute --> IntTest["Integration Test"]
IntTest --> Product["Product"]
IntTest -->|"Issues found"| Tickets
Review -->|"Not ready"| Research
Every phase has a gate (exit criteria) and a T1P standard (quality bar). No phase is skipped. Iteration loops are expected and healthy.
10.2 Phase Definitions¶
Phase 1: Topic to Project¶
| Attribute | Value |
|---|---|
| Trigger | Idea or discussion identified as a potential project |
| Actions | Create project in ODM (name, description, topics). Assign project orchestrator. Define initial scope and constraints. |
| Gate | Project registered, orchestrator assigned |
| T1P Standard | Clear objective, bounded scope, measurable success criteria |
| Estimated Time | 1-2 hours |
Phase 2: Research¶
| Attribute | Value |
|---|---|
| Trigger | Project created |
| Actions | Research Center scans relevant sources. Domain research (competitors, standards, technologies). Feasibility assessment. Multiple research threads possible (parallel). |
| Gate | Research outputs reviewed, key decisions documented |
| T1P Standard | Evidence-based decisions, source lineage, evaluation records |
| Estimated Time | 2-4 hours per research thread |
Phase 3: Blueprint¶
| Attribute | Value |
|---|---|
| Trigger | Research complete, direction decided |
| Actions | Create project blueprint (using XIOPro BP as template). Define architecture, data model, components. Librarian decomposes into knowledge notes. |
| Gate | Blueprint complete, all sections covered |
| T1P Standard | ODM entities defined, data schema written, module index complete |
| Estimated Time | 4-8 hours |
Phase 4: Review and Readiness¶
| Attribute | Value |
|---|---|
| Trigger | Blueprint draft complete |
| Actions | Internal review (GO scans for gaps, consistency). External review (send to ChatGPT, Gemini, NotebookLM). System Review process (Part 9 checklists, Sections 1-9). Build readiness evaluation. |
| Gate | Reviews complete, all critical findings addressed |
| T1P Standard | 3+ external reviews, risk register, dependency map, ER diagram |
| Estimated Time | 2-4 hours (reviews run in parallel) |
Build Readiness Checklist:
| Check | Criterion |
|---|---|
| Data schema complete | DDL written and validated |
| All entities defined | Every object has properties, lifecycle, relationships |
| Risk register complete | 15+ risks with mitigations |
| Dependencies mapped | Critical path identified |
| Test strategy defined | Test layers for each output type |
| Review findings addressed | All critical items fixed |
| Ticket coverage verified | Every module has implementing tickets |
Phase 5: Work and Test Plan¶
| Attribute | Value |
|---|---|
| Trigger | Build readiness gate passed |
| Actions | Generate tickets from blueprint (automated from Part 12 template). Estimate effort using XIOPro Time Database (not human estimates). Create sprint plan with dependency ordering. Define test plan per ticket. |
| Gate | All tickets written with completion tests |
| T1P Standard | Every ticket has: plan, completion test, review requirement |
| Estimated Time | 1-2 hours |
Phase 6: Sprint Execution¶
| Attribute | Value |
|---|---|
| Trigger | Tickets created and prioritized |
| Actions | Agents pick up tickets from Bus. Execute with review gates (code: test, UI: screenshot, doc: validation). Continuous Paperclip sync. Real-time progress in Control Center. |
| Gate | All sprint tickets pass completion tests |
| T1P Standard | Every output reviewed and tested per type |
| Sprint Duration | Hours, not weeks. Typically 2-8 hours per sprint. |
| Estimated Time | 4-12 hours per sprint |
Phase 7: Integration Test¶
| Attribute | Value |
|---|---|
| Trigger | Sprint complete |
| Actions | End-to-end test (walking skeleton pattern). Cross-component integration verification. Performance baseline. Security scan. |
| Gate | All integration tests pass |
| T1P Standard | Walking skeleton proven, no regressions |
| Estimated Time | 1-2 hours |
Phase 8: Product¶
| Attribute | Value |
|---|---|
| Trigger | Integration tests pass |
| Actions | Deploy to production. Verify health endpoints. Update documentation. Sprint retrospective. |
| Gate | Product live and monitored |
| T1P Standard | Deployment checklist complete, monitoring active |
| Estimated Time | 1-2 hours |
10.3 Agile Principles (XIOPro-Calibrated)¶
| Principle | XIOPro Interpretation |
|---|---|
| Sprint duration | Hours, not weeks |
| Iteration speed | Multiple sprints per day possible |
| Feedback loops | External review + internal testing + user feedback |
| Continuous | No waterfall phases -- iterate constantly |
| Human calibration | Build XIOPro Time Database from actual execution data |
10.4 T1P Standards Discovery¶
XIOPro discovers and applies T1P (Top 1 Percent) standards through the Research Center, scanning industry best practices for each lifecycle phase.
Sources:
- awesome-agentic-patterns (nibzard)
- Software engineering best practices
- ISO/IEC 25010 (software quality model)
- XIOPro's own execution history
Standards by Phase:
| Phase | T1P Standards |
|---|---|
| Blueprints | 12-part structure minimum. ODM with lifecycle states. Risk register with mitigations. Dependency map. External review by 3+ LLMs. |
| Build Readiness | DDL must run without error. Walking skeleton acceptance scenarios defined. Every module has implementing tickets. Review findings addressed. |
| Work Plans | Every ticket has completion test. Dependencies mapped and ordered. Agent time estimates (not human). Sprint duration in hours. |
| Execution | Review gate per output type. Playwright screenshot for UI. API test suite for endpoints. Walking skeleton re-run after integration. |
10.5 XIOPro Time Database¶
The XIOPro Time Database replaces human time estimates with calibrated agent execution benchmarks. It grows with every project XIOPro runs.
Schema:
table: execution_time_benchmarks
columns:
- task_type: "api_endpoint | ui_widget | research | blueprint | document | migration | test_suite"
- complexity: "low | medium | high"
- model_used: "opus | sonnet | haiku"
- estimated_human_hours: float
- actual_agent_minutes: float
- acceleration_ratio: float # human_hours * 60 / agent_minutes
- sample_count: int
Seed Data (from XIOPro Day 1 -- 13 days estimated, ~2.5 hours actual):
| Task Type | Complexity | Model | Human Est (h) | Agent Actual (min) | Ratio | Samples |
|---|---|---|---|---|---|---|
| api_endpoint | medium | sonnet | 16 | 5 | 192x | 8 |
| ui_widget | medium | sonnet | 24 | 8 | 180x | 6 |
| blueprint_section | high | opus | 480 | 20 | 1440x | 12 |
| research | medium | opus | 240 | 10 | 1440x | 10 |
| migration | medium | sonnet | 480 | 10 | 2880x | 1 |
| test_suite | medium | sonnet | 240 | 8 | 1800x | 4 |
This database is the basis for all XIOPro project estimation. Human estimates are recorded for comparison but never used for planning.
10.6 ODM Connection¶
The project lifecycle phase is tracked on the Project entity:
project:
lifecycle_phase: enum
# topic | research | blueprint | review | planning | execution | integration | production | maintenance
This field is distinct from the three-dimensional state model (status/state/status_state). The state model tracks workflow position; lifecycle_phase tracks which lifecycle gate the project has reached.
Transition rules: - lifecycle_phase advances when the gate criteria for the current phase are met - lifecycle_phase can regress (e.g., review -> research when readiness check fails) - Only the project orchestrator or system master can advance lifecycle_phase - Every transition is logged in the project's activity history
11. Project Template Architecture¶
XIOPro is a multi-template project factory. Each template shares the same engine but applies different skills, expertise, and potentially additional engines or governors.
11.1 Template Structure¶
Every project template consists of:
- Core (shared by all templates): Control Bus, ODM, Governance, UI, Knowledge System, Librarian
- Skills: Template-specific skill sets from the Skill Registry
- Agent Roles: Template-specific role assignments
- Additional Engine (optional): Template-specific processing (e.g., ISO engine for compliance)
- Additional Governor (optional): Template-specific governance rules
- Lifecycle: Same 8-phase lifecycle (Topic to Product), customized gates per template
11.2 Initial Templates¶
IT Project Template (Active -- being built now)¶
- Purpose: Build software products
- Skills: coding, testing, architecture, deployment, debugging, TDD
- Agent Roles: orchestrator, engineering specialist, devops specialist
- Engine: Standard XIOPro engine
- Output: Software products, APIs, UIs, infrastructure
Marketing Template (Planned)¶
- Purpose: Marketing campaigns and go-to-market
- Skills: SEO, ad copy, competitor analysis, campaign planning, lead research
- Agent Roles: orchestrator, marketing specialist, content specialist
- Additional: Competitive Ads analysis, Lead Research skills
- Output: Campaigns, landing pages, ad copy, market analysis
Content Creation Template (Planned)¶
- Purpose: Create and manage content
- Skills: writing, brand voice (Voice DNA), research, editing, citations
- Agent Roles: orchestrator, content specialist, editor
- Additional: NotebookLM for synthesis, Voice DNA for brand consistency
- Output: Articles, documentation, presentations, podcasts, educational material
Knowledge Expert Template (Planned)¶
- Purpose: Domain expertise and knowledge management
- Skills: research, synthesis, classification, evaluation, teaching
- Agent Roles: orchestrator, research specialist, domain expert
- Additional: Research Center automation, Librarian deep integration
- Output: Knowledge bases, evaluations, training materials, expert consultations
Knowledge Expert Domains (examples)¶
- ISO 19650: Parts 1-6, national annexes, implementation guidance
- BIM: IFC, openBIM, model coordination, clash detection, LOD/LOI
- Construction Industry Players:
- Project Initiator / Owner / Developer
- Design Partners (architects, structural engineers, MEP engineers)
- General Contractor
- Subcontractors (electrical, plumbing, HVAC, concrete, steel)
- Inspectors and quality assessors
- Quantity surveyors
- Project managers and BIM managers
11.3 Template Registry¶
template_registry:
location: "struxio-logic/templates/"
format: "YAML template definition + skill list + role assignments"
template_definition:
id: string
name: string
description: string
status: active | planned | deprecated
core_skills: [skill_id] # from SKILL_REGISTRY
additional_skills: [skill_id] # template-specific
agent_roles:
- role: string
skills_on_load: [skill_id]
min_model: string
additional_engine: string|null # e.g., "iso19650-engine"
additional_governor: string|null # e.g., "compliance-governor"
lifecycle_customizations:
research_sources: [source_id] # template-specific research sources
review_criteria: [string] # additional review gates
test_requirements: [string] # template-specific tests
11.4 Creating a New Template¶
- Define template in YAML (skills, roles, engines)
- Register in template_registry
- Create project using template -- auto-assigns skills and roles
- Project lifecycle applies with template-specific customizations
11.5 Rule¶
All templates share the same XIOPro engine and lifecycle. The differentiation is in skills, roles, and domain expertise -- not in the core platform. This ensures consistency across all project types.
12. Blueprint Part Numbering: Specification vs Operational¶
Parts 1-8 are specification. Parts 9-14 are operational. This separation is intentional. No renumbering needed.
| Range | Nature | Parts |
|---|---|---|
| Parts 1-8 | Specification — what the system is | Foundations, Architecture, ODM, Agent System, Knowledge System, UI, Governance, Infrastructure |
| Parts 9-14 | Operational — how the system runs | Project Templates, Swarm Architecture, System Review, Work Plan, Execution Log, Ticket Register |
The spec/operational boundary is a structural design decision, not an accident of growth. Renumbering would destroy the meaning of the boundary and break all existing cross-references.
Changelog¶
| Date | Change |
|---|---|
| 2026-03-29 | Part 9 created. System Review as verification gate: ER diagram audit, module index (25 modules across 8 epics), subject index (50+ cross-referenced entries), risk register (15-20 risks with severity and mitigation), dependency order (37 tickets mapped with critical path through TKT-1002 -> 1004 -> 1011 -> 1020 -> 1063 at 11 days), data flow diagrams (4: task lifecycle, agent communication, knowledge flow, cost flow), process checklists (7: project setup, agent commissioning, sprint start, sprint close, technology evaluation, deployment, recovery), meta-capability definition (ODM entity, skill, governance rule). |
| 2026-03-29 | Added Section 10: Project Lifecycle Management -- Topic to Product. 8-phase lifecycle with gates and T1P standards. T1P Standards Discovery process. XIOPro Time Database (agent execution benchmarks seeded from Day 1 data). ODM lifecycle_phase enum for Project entity. |
| 2026-03-29 | Added Section 11: Project Template Architecture. Multi-template project factory design. 4 initial templates (IT Project active, Marketing/Content/Knowledge Expert planned). Template Registry YAML schema. Knowledge Expert domains include ISO 19650, BIM, Construction Industry Players. |
| 2026-03-30 | N17: Added Section 12 — Blueprint Part Numbering: Specification vs Operational. Documents that Parts 1-8 are specification, Parts 9-14 are operational. Separation is intentional; no renumbering needed. |