Skip to content

XIOPro Production Blueprint v5.0

Part 4 — Execution & Agent System


1. Purpose of This Part

This document defines how XIOPro:

  • executes work
  • runs agents
  • manages sessions
  • integrates LLM providers
  • supports Remote Control (RC)
  • ensures continuity and recovery
  • optimizes cost and performance

This is the layer that connects:

ODM (Part 3) → Real execution in the world


2. Execution Philosophy

XIOPro execution is:

  • agent-driven
  • ticket-based
  • state-controlled
  • cost-aware
  • provider-agnostic
  • continuously improving

Execution must:

  • never depend on a single session
  • survive crashes
  • resume from state
  • remain observable

3. Execution Stack Overview

flowchart TD
    Ticket --> Task
    Task --> A000
    Orchestrator["Orchestrator"] --> AgentSelection
    AgentSelection --> ModelRouter
    ModelRouter --> ExecutionEngine
    ExecutionEngine --> Activity
    Activity --> DB
    DB --> Governor["Governor"]

4. Core Components

4.1 Orchestrator Role

Formerly: O00 — Orchestrator

Role

Primary execution coordinator. In the unified agent identity model, the orchestrator role is one of several role bundles that can be assigned to an agent. See Part 1, Section 8 for the complete role bundle and agent identity definitions.

Responsibilities

  • read work graph
  • assign tasks
  • select agents
  • trigger execution
  • handle failures
  • maintain continuity

4.1A Orchestrator Surface Names

XIOPro uses named orchestrator surfaces for easy identification:

Name Full Name Host Launch Command Role
GO Global Orchestrator Hetzner devxio go or GO Primary orchestrator. Runs 24x7. Manages all projects, agents, state.
MO Mac Orchestrator Mac Studio devxio mo or MO Mac-local orchestrator. Handles Mac tasks, browser testing, local experiments.

Rules

  • GO and MO are surface names, not agent IDs. The agent running GO might be 000 or any agent with the orchestrator role.
  • Both launch via the devxio command with the surface as argument.
  • GO is the primary -- MO reports to GO via the Control Bus.
  • Both can run simultaneously on different hosts.

4.2 Governor Role

Formerly: O01 — Governor

This role is part of the XIOPro Optimizer (see Part 1, Section 8A).

Role

System optimization and protection. In the unified agent identity model, the governor role is a role bundle that can be assigned alongside the orchestrator role.

Responsibilities

  • cost tracking
  • anomaly detection
  • performance analysis
  • optimization recommendations
  • circuit breaking

4.2A Rule Steward Role

Formerly: R01 — Rule & Skill Steward

This role is part of the XIOPro Optimizer (see Part 1, Section 8A).

Role

Role bundle responsible for the lifecycle quality of:

  • RULE_* assets
  • SKILL_* assets
  • agent activation assets such as claude.md
  • reusable operating patterns / templates

The rule steward role is not a runtime governor like the governor role. It is the steward of behavioral assets that shape how XIOPro thinks and executes.

Why It Exists

As XIOPro evolves, the system will continuously accumulate:

  • new skills
  • revised rules
  • agent-specific activations
  • overlapping procedures
  • obsolete guidance
  • conflicting operating patterns

Without a dedicated steward, these assets drift, duplicate, and eventually degrade execution quality.

The rule steward role exists to keep the rule/skill layer:

  • coherent
  • reusable
  • discoverable
  • conflict-minimized
  • approval-governed

Primary Responsibilities

The rule steward must:

  • search for existing rules/skills before new ones are created
  • detect missing capabilities and propose new skill creation
  • validate structure, metadata, and completeness of rule/skill assets
  • detect overlap, contradiction, duplication, and drift
  • evaluate whether an activation file like claude.md remains effective
  • propose consolidation, supersession, deprecation, or promotion
  • draft new skills using existing approved skills when appropriate
  • open approval flows for protected changes
  • maintain lineage across revisions

Non-Responsibilities

The rule steward must not:

  • silently change live execution behavior
  • bypass founder approval for protected changes
  • replace governor runtime governance
  • become uncontrolled self-modification
  • commit rule/skill mutations directly into production without policy

Managed Asset Classes

managed_assets:
  - RULE
  - SKILL
  - ACTIVATION
  - PATTERN
  - PROTOCOL

Operating Modes

r01_modes:
  - audit
  - evaluate
  - propose
  - normalize
  - deprecate

Core Inputs

The rule steward should consume:

  • current RULE_* files
  • current SKILL_* files
  • activation files such as claude.md
  • historical incidents and overrides
  • task/result/reflection history
  • Dream Engine proposals
  • founder/operator requests
  • performance and reuse signals

Core Outputs

The rule steward may emit:

  • validation report
  • conflict report
  • redundancy report
  • skill-gap report
  • draft skill proposal
  • draft rule proposal
  • activation improvement proposal
  • deprecation recommendation
  • approval request

Technology Model

For T1P, rule and skill stewardship should use a dual representation:

  1. Human-readable source of truth

  2. Markdown assets in Git

  3. explicit metadata/front matter
  4. examples, rationale, and scope
  5. Structured runtime mirror

  6. normalized YAML/DB representation

  7. queryable scope, precedence, owner, status, and approval requirements
  8. machine-evaluable validation state

The rule steward operates across both layers.

Stewardship Flow

flowchart TD
    NeedOrProposal --> SearchExisting
    SearchExisting -->|found reusable asset| EvaluateFit
    SearchExisting -->|gap detected| DraftNewAsset
    EvaluateFit --> ValidateAsset
    DraftNewAsset --> ValidateAsset
    ValidateAsset --> DetectConflicts
    DetectConflicts --> ApprovalGate
    ApprovalGate --> PublishApprovedAsset
    PublishApprovedAsset --> UpdateIndex
    UpdateIndex --> AvailableForAgents

Relation to Other Components

Component Relation to Rule Steward
Orchestrator role consumes approved rules/skills during execution
Governor role governs runtime behavior using approved policy/rule outputs
Librarian stores, indexes, versions, and retrieves managed assets
Dream Engine may propose skill/rule improvements but does not approve them
Human Operator approves protected changes and resolves high-impact conflicts

Final Rule

The rule steward role is the custodian of execution behavior assets.

The governor role protects runtime. The rule steward role protects the quality and evolution of the rule/skill layer.


4.2B Prompt Steward Role

Formerly: P01 — ContextPrompting Orchestrator

This role is part of the XIOPro Optimizer (see Part 1, Section 8A).

See resources/DESIGN_rc_architecture.md for the Remote Control architecture design covering how human-agent interaction surfaces (Open WebUI, Prompt Composer) connect to the prompt steward role via the Control Bus.

Role

Role bundle responsible for transforming vague human intent and incomplete task context into execution-ready prompt packages.

The prompt steward does not replace context engineering.

It complements context engineering by deciding:

  • whether enough information already exists
  • whether questions should be asked
  • which questions are worth asking
  • how human answers should be converted into durable execution context
  • how the final prompt package should be assembled for the active runtime

Why It Exists

XIOPro does not rely on a single giant "super prompt".

Prompt quality is not only a writing problem. It is also a questioning problem.

As topics, tickets, and issues evolve, the system must be able to:

  • detect ambiguity
  • detect missing constraints
  • detect weak assumptions
  • ask the minimum useful questions
  • preserve the answers for future execution continuity

This applies to:

  • XIOPro itself
  • all STRUXIO products (see MVP1_PRODUCT_SPEC.md for the first product)
  • future STRUXIO.ai product flows

Core Principle

XIOPro replaces static prompt engineering with:

  • context engineering
  • prompt orchestration
  • interactive inquiry

When ambiguity materially affects quality, risk, relevance, or cost, the system should prefer targeted inquiry over silent assumption.

Primary Responsibilities

The prompt steward must:

  • assess task readiness before execution
  • identify missing intent, constraints, preferences, and assumptions
  • select an appropriate prompting mode
  • generate targeted clarifying questions
  • classify questions as optional or blocking
  • convert human answers into structured execution context
  • assemble runtime-specific prompt packages for the orchestrator / execution agents
  • maintain prompt lineage across revisions and retries
  • support human collaboration during design/problem-shaping tasks

Non-Responsibilities

The prompt steward role must not:

  • replace orchestrator execution orchestration
  • replace governor governance
  • replace rule steward rule/skill stewardship
  • ask unlimited annoying questions
  • block execution when policy allows bounded assumptions
  • silently mutate durable context without traceability

ContextPrompting Modes

contextprompting_modes:
  - direct
  - governed
  - clarify
  - collaborate
Mode Meaning
  • direct = execute immediately with no inquiry unless required by policy
  • governed = ask only required approval/risk/policy questions
  • clarify = ask a small number of targeted questions before execution
  • collaborate = work interactively with the human to shape the problem
Default Mode

Default user-facing mode should be:

default_contextprompting_mode: collaborate

Question Budget

The prompt steward should control how many questions are asked.

question_budget:
  - none
  - light
  - normal
  - deep

Typical guidance:

  • none → direct execution utility task
  • light → one or two material clarifications
  • normal → bounded pre-execution shaping
  • deep → collaborative framing for complex strategic/design work

Prompting Readiness Decision

Before execution, the prompt steward should determine one of:

prompt_readiness_decision:
  - ready_now
  - ask_optional_questions
  - ask_blocking_questions
  - require_human_collaboration
  - require_governed_approval

Inquiry Output Classes

Human answers gathered by the prompt steward should be transformed into durable objects such as:

  • clarified_intent
  • assumptions
  • constraints
  • preferences
  • unresolved_questions
  • approval_inputs
  • prompt_packet_inputs

These outputs must be attachable to:

  • ticket
  • task
  • activity
  • runtime
  • session
  • human decision history

Prompt Package Contract

The prompt steward should produce a bounded prompt package rather than a monolithic prompt blob.

prompt_package:
  task_id: string|null
  runtime_id: string|null
  prompting_mode: enum
  readiness_decision: enum

  goal: string|null
  scope: [string]
  constraints: [string]
  assumptions: [string]
  unresolved_questions: [string]

  relevant_context_refs: [string]
  relevant_rule_refs: [string]
  relevant_skill_refs: [string]

  human_answer_refs: [string]
  recommended_next_step: string|null

Example Operating Logic

  • If context is sufficient and risk is low → direct
  • If policy or approval applies → governed
  • If a few answers would materially improve quality → clarify
  • If the problem itself needs shaping with the founder → collaborate

Collaboration Rule

For strategic design, architecture, product shaping, and other high-value ambiguous work, XIOPro should usually prefer collaborate mode.

This is especially relevant to blueprint creation, MVP definition, and early STRUXIO.ai product design.

Interaction with Other Components

Component Relation to Prompt Steward
Orchestrator role consumes execution-ready prompt packages
Governor role constrains prompting when governance/policy requires
Rule Steward role supplies validated rules/skills/activations for prompt assembly
Librarian supplies supporting knowledge/context assets
RC / UI provides the human interaction surface for inquiry
Dream Engine may identify missing recurring skills/questions patterns

Success Criteria

The prompt steward is successful when:

  • the system asks fewer but better questions
  • execution starts with clearer intent
  • assumptions become explicit rather than hidden
  • human collaboration improves high-value tasks
  • prompt packages remain compact, relevant, and traceable
  • inquiry improves quality without becoming friction-heavy

Final Rule

Prompting in XIOPro is not a single artifact.

It is a governed interactive process for shaping execution quality.


4.2C Module Steward Role

Formerly: M01 — Module Portfolio Steward & Optimizer

This role is part of the XIOPro Optimizer (see Part 1, Section 8A).

Role

Role bundle responsible for governing, evaluating, optimizing, and evolving XIOPro's module portfolio across:

  • subscription-backed access
  • API-key access
  • local/self-hosted runtimes
  • cloud/server-hosted runtimes
  • future hybrid execution paths

The module steward treats module choice as a governed optimization problem, not an ad hoc per-agent preference.

Why It Exists

XIOPro will use many modules across many surfaces, agents, and workflows.

That creates a portfolio problem, not just a cost problem.

The system must continuously optimize the use of modules and subscriptions across constrained resources such as:

  • compute power
  • memory
  • bandwidth
  • time / latency
  • monetary cost
  • quota / subscription utilization

while maximizing:

  • quality
  • stability
  • trust

Without a dedicated steward, module usage drifts into:

  • duplicated capability
  • poor routing choices
  • underused subscriptions
  • wasteful cost
  • weak fallback design
  • hidden dependency on vendor-specific surfaces
  • unmanaged self-hosted complexity

Primary Responsibilities

The module steward must:

  • maintain the governed registry of available modules and access paths
  • understand which modules are available by subscription, API, or self-hosting
  • evaluate module fitness by task type, quality target, and environment
  • optimize module selection across constrained resources
  • recommend preferred and fallback modules
  • detect waste, underuse, poor fit, and overlap
  • detect deprecated or weakening module options
  • recommend when self-hosting becomes justified
  • scout and evaluate new modules, plans, and hosting options
  • prepare adoption / upgrade / retirement proposals
  • coordinate with governor role for runtime enforcement
  • coordinate with Part 8 constraints for actual hosting feasibility

Non-Responsibilities

The module steward role must not:

  • auto-purchase subscriptions
  • auto-deploy new module stacks into production
  • auto-switch the portfolio without approval where policy requires it
  • replace governor runtime governance
  • replace prompt steward prompt-package assembly
  • replace rule steward rule/skill stewardship

Governed Asset Classes

managed_module_assets:
  - MODULE
  - MODULE_POLICY
  - SUBSCRIPTION
  - HOSTING_PROFILE
  - MODULE_EVALUATION
  - MODULE_RECOMMENDATION

Optimization Objective

Core optimization principle:

Module choice is a governed optimization game.

Optimization is not only about lowering cost.

It is about achieving the best feasible balance of:

  • quality
  • stability
  • trust
  • speed
  • resource efficiency
  • operational resilience

Typical Dimensions Evaluated

The module steward should evaluate at least:

  • task fit
  • quality / output reliability
  • latency
  • token / usage cost
  • subscription utilization
  • memory and compute footprint
  • bandwidth / network dependency
  • privacy / exposure profile
  • execution surface compatibility
  • hosting feasibility
  • fallback availability
  • operational complexity

Example Questions the Module Steward Must Answer

  • Which module should this class of task prefer by default?
  • Which fallback should be used when the preferred module is unavailable?
  • Which subscriptions are underused or strategically weak?
  • Which self-hosted options are worth evaluating next?
  • Which modules should be deprecated or constrained?
  • Which environments can actually support a proposed new module?

Evidence Sources & Scouting Inputs

The module steward must optimize from evidence, not intuition alone.

Its scouting and evaluation inputs may include:

  • provider documentation
  • provider pricing and plan changes
  • approved benchmark/evaluation reports
  • internal task/module evaluation history
  • research outputs from the Research Center
  • approved web research results
  • Hugging Face model and repository research
  • local or remote CLI-based research tools
  • self-hosting feasibility notes from infrastructure
Hugging Face Rule

Hugging Face may be used as a governed scouting source for:

  • candidate module discovery
  • repository discovery
  • self-hosting research leads
  • capability comparison
  • surrounding ecosystem signals

But Hugging Face findings are not automatic approvals.

They are candidate inputs that must still flow through:

  • module steward evaluation
  • rule steward / prompt steward / governor constraints where relevant
  • approval policy for adoption or strategic change

Runtime Feedback & Telemetry Requirement

The module steward must receive real usage evidence from execution.

At minimum, module usage should be traceable by:

  • module/provider
  • access path
  • execution surface
  • runtime
  • session
  • activity
  • task
  • ticket
  • latency / retry profile
  • estimated or billed cost where available

This feedback loop is required so module optimization can use:

  • actual performance
  • actual stability
  • actual cost pressure
  • actual subscription utilization
  • actual fallback frequency

rather than only assumptions.

Optimization Rule

A module recommendation is incomplete unless it can be supported by at least one of:

  • direct internal usage evidence
  • credible external evaluation
  • controlled comparison result
  • explicit exploratory candidate status

This prevents portfolio decisions from becoming folklore.

Interaction with Other Components

Component Relation to Module Steward
Orchestrator role executes within the governed module portfolio
Governor role enforces module policy, constraints, and anomaly responses at runtime
Prompt Steward role uses module portfolio guidance when assembling prompt packages
Rule Steward role stewards rules/skills/activations that shape module usage
LiteLLM / routing layer applies preferred/fallback routing decisions where applicable
Part 8 infrastructure provides the actual hosting and resource envelope
Human operator approves protected additions, removals, and strategic changes

Success Criteria

The module steward is successful when:

  • module usage is explainable rather than ad hoc
  • portfolio choices improve quality, stability, and trust
  • cost and resource use are optimized without hidden fragility
  • subscriptions are used deliberately rather than accidentally
  • self-hosting proposals are grounded in real need and real feasibility
  • new modules are evaluated systematically before adoption
  • fallback and retirement decisions are intentional

Final Rule

Module usage in XIOPro is not a side effect.

It is a governed optimization discipline.


4.2D T1P Implementation Form of Role Bundles

Purpose

The named XIOPro role bundles are architectural capabilities assigned to agents (see Part 1, Section 8).

For T1P, they must also be made concrete as implementation units.

This section defines what these role bundles are at code and deployment level, so the blueprint can be ticketized without pretending that every role is a separate distributed system.

Note: In the unified agent identity model, all five role bundles (orchestrator, governor, rule_steward, prompt_steward, module_steward) can be assigned to a single agent. They are implemented as separate code modules, not separate agents.


4.2D.1 Core T1P Deployables

T1P should begin with a deliberately small set of deployables:

  1. web-ui
  2. widget-first web control center

  3. api-service

  4. FastAPI-based control/API layer
  5. owns core request/response interfaces
  6. exposes founder/operator and UI-facing APIs
  7. emits SSE streams for live updates

  8. worker-service

  9. Python worker/runtime service
  10. executes jobs, orchestration loops, research tasks, and background governance tasks

  11. postgres

  12. authoritative operational state store

  13. reverse-proxy

  14. Caddy

  15. observability stack

  16. OpenTelemetry / Prometheus / Grafana

Optional in T1P where needed:

  1. litellm-router
  2. only for API-backed module routing paths
  3. not required for subscription-only human-operated surfaces

  4. Ruflo execution fabric

  5. runtime integration layer for bounded multi-agent execution

Rule

The named professions do not require one deployable each in T1P.

They may begin as application services/modules inside a smaller number of processes.


4.2D.2 Orchestrator Role — Implementation Form

Formerly: O00.

T1P form:

  • application service / orchestration module
  • primarily hosted inside:
  • api-service
  • worker-service for longer-running execution and coordination loops

The orchestrator module should own:

  • orchestration logic
  • task assignment logic
  • execution progression logic
  • handoff into runtime fabric
  • resume/recovery coordination at orchestration level

The orchestrator module should not be implemented as: - only a prompt persona - only a markdown convention - only a UI abstraction

State Ownership

The orchestrator module does not own state authoritatively.

Authoritative state remains in: - PostgreSQL-backed work graph / ODM

The orchestrator module reads and mutates that state through explicit services and records.


4.2D.3 Governor Role — Implementation Form

Formerly: O01.

T1P form:

  • governance service / policy evaluation module
  • implemented as:
  • synchronous policy checks in api-service
  • background anomaly / breaker / rollup evaluation in worker-service

The governor module should own:

  • alert evaluation
  • breaker logic
  • approval gate checks
  • runtime constraint decisions
  • cost anomaly checks
  • governance event emission

The governor module should not be: - only a chat persona - an invisible UI-side heuristic layer

State Ownership

The governor module does not own canonical operational objects.

It owns: - governance logic - governance records - policy evaluation outputs

Authoritative governance events and related objects remain stored in PostgreSQL.


4.2D.4 Rule Steward Role — Implementation Form

Formerly: R01.

T1P form:

  • application service / governed asset module
  • implemented initially inside api-service and worker-service
  • not required as a separate deployable in T1P

The rule steward module should own:

  • search-before-create checks
  • validation of rule/skill/activation assets
  • conflict/overlap detection
  • publication and approval routing
  • asset-lifecycle support
State Ownership

The rule steward module does not own the source of truth for assets.

Sources of truth remain: - Git-managed asset files - structured mirror records in PostgreSQL where applicable

The rule steward module owns validation and stewardship behavior over those assets.


4.2D.5 Prompt Steward Role — Implementation Form

Formerly: P01.

T1P form:

  • application service / prompt-package and inquiry module
  • implemented initially inside api-service
  • may use worker-service for longer-running preparation tasks if needed

The prompt steward module should own:

  • prompting mode interpretation
  • readiness decision
  • question selection
  • blocking vs optional inquiry classification
  • prompt-package assembly
  • promotion of meaningful answers into durable context
State Ownership

The prompt steward module does not own chat history as the source of truth.

It reads and writes through: - discussion threads - tasks - sessions - human decisions - prompt package records / structured context refs


4.2D.6 Module Steward Role — Implementation Form

Formerly: M01.

T1P form:

  • application service / module registry and optimization module
  • implemented initially inside api-service
  • background evidence aggregation and recommendation refresh may run in worker-service

The module steward module should own:

  • module registry logic
  • recommendation logic
  • fallback logic
  • evidence-backed comparison logic
  • subscription/access-path awareness
  • hosting-feasibility evaluation
  • proposal preparation for adoption/deprecation
T1P Narrowing Rule

For T1P, the module steward may begin as a narrow module registry + recommendation layer rather than a large autonomous portfolio engine.

The architectural role remains module_steward, but its initial implementation scope may be deliberately narrow.


4.2D.7 Communication Model

T1P communication should stay simple.

UI <-> Backend
  • REST/JSON over HTTPS
  • SSE for live updates
  • WebSocket only where true bidirectional streaming is justified
API <-> PostgreSQL
  • ORM / explicit persistence layer
  • no direct UI-to-DB path
API <-> Worker
  • PostgreSQL-backed job dispatch / claim / update model
  • no separate broker required in T1P
Backend <-> Ruflo / runtime surfaces
  • adapter/service boundary
  • explicit execution records
  • runtime/session IDs preserved
Backend <-> LiteLLM
  • used only for API-backed module routes
  • not required to mediate subscription-only human-operated paths

4.2D.8 State Ownership Summary

Role / Component Owns Logic Owns Canonical State
Orchestrator Yes No
Governor Yes No
Rule Steward Yes No
Prompt Steward Yes No
Module Steward Yes No
Reviewer Yes No
PostgreSQL / ODM No Yes
Git-managed governed assets No Yes for source assets

Rule

The professions own behavior. The system stores canonical state in explicit durable stores.

This prevents role descriptions from becoming state silos.


4.2D.9 T1P Implementation Constraint

T1P should prefer:

  • fewer deployables
  • more explicit modules/services inside those deployables
  • strong contracts
  • durable state
  • clear event and job records

The blueprint may name many professions.

T1P should not force each profession into an independent distributed runtime prematurely.


4.2D.10 Final Rule

For T1P, XIOPro should be implemented as:

  • a small number of deployables
  • a larger number of explicit services/modules
  • one canonical work graph/state layer
  • one clear operator UI
  • one recoverable orchestration and governance core

This preserves architectural clarity without over-distributing the system too early.


4.2E T1P Implementation Form Table (v5.0 Addition)

Each role bundle's concrete T1P implementation form is summarized below for quick reference during ticketization and implementation.

All role bundles are implemented as separate code modules, not separate agents. Current assignment: see Section 19.

Role Bundle T1P Implementation
orchestrator Python module in api-service. Reads ODM, assigns tasks, triggers execution. Uses Ruflo for agent spawning.
governor Python module in api-service. Policy evaluation, breaker logic, cost tracking. Thin initially.
rule_steward Python module in worker-service. Validation, conflict detection, search-before-create. Thin initially -- most logic handled by orchestrator module inline.
prompt_steward Python module in api-service. Readiness assessment, question generation, prompt assembly. Start as simple logic, not full orchestrator.
module_steward Python module in worker-service. Module registry, usage tracking, recommendation. Start as config file + simple recommendation logic.
reviewer On-demand agent spawned per review request. No persistent module -- spawned as a short-lived Claude Code session with reviewer role activation. Verdict stored as Activity Evaluation in PostgreSQL.

4.2F Host Resource Awareness (v5.0 Addition)

The orchestrator must check host capacity before spawning any agent.

Pre-Spawn Check

Before spawning an agent, the orchestrator must:

  1. Query the Host Registry for the target host's current state
  2. Check active_agents against max_concurrent_agents
  3. Check current RAM usage against 85% threshold
  4. If capacity insufficient: queue the task, try another host, or escalate

Agent-Host Binding

Every Agent Runtime must carry:

host_id: string          # which host this agent runs on
host_name: string        # human-readable reference
resource_estimate:
  ram_gb: float          # estimated RAM this agent will consume
  cpu_cores: float       # estimated CPU usage

Multi-Host Execution

XIOPro supports execution across multiple hosts:

Host Role Typical Workloads
Hetzner CPX62 control_plane orchestrator (all roles), services, domain brains, workers
Mac Studio M1 (32GB) hybrid remote worker, local experiments, overflow agents
Future cloud nodes worker / gpu compute-intensive tasks, self-hosted models

The orchestrator should prefer the control plane host for orchestration and distribute overflow to available hosts.

OOM Prevention

  • 85% RAM threshold triggers "no new agents" gate
  • 90% triggers graceful shutdown of lowest-priority agents
  • 95% triggers emergency agent termination + alert to founder
  • Host health is monitored by the governor with breaker policies

4.2G Ruflo Relationship to Orchestrator Role (v5.0 Clarification)

Ruflo (claude-flow) is the agent execution runtime. The orchestrator is the orchestration logic that uses Ruflo.

The separation is:

  • The orchestrator decides WHAT to execute. It reads the work graph, selects tasks, assigns agents, determines execution order, and manages progression.
  • Ruflo decides HOW to spawn agents. Ruflo handles agent lifecycle, process spawning, sub-agent coordination, execution boundaries, and runtime fabric management.

The orchestrator invokes Ruflo as its execution fabric. Ruflo does not contain orchestration logic -- it provides the runtime machinery that the orchestrator directs.

This distinction prevents confusion between the orchestration role and the execution runtime (Ruflo). They are complementary, not interchangeable.


4.2H XIOPro Control Bus (v5.0 Addition)

The XIOPro Control Bus is the unified communication and coordination backbone. Full specification (architecture, capabilities table, intervention model, push delivery, data access rules, migration path): see Part 2, Section 5.8.

Agent Communication Flow

Agent starts session
  → registers with Bus (POST /agents/register) using 3-digit agent_id
  → opens SSE channel (GET /events/{agent_id})
  → receives tasks, messages, interventions via push
  → reports activity results back to Bus
  → heartbeats every 60 seconds
Agent ends session
  → Bus marks agent offline
  → queued messages persist for next session

Relationship to Ruflo

Layer Scope Persistence
Control Bus Cross-session, cross-host PostgreSQL — survives everything
Ruflo Within-session, within-host Session memory — dies with session

Ruflo reports state to the Bus. The Bus does not depend on Ruflo.


4.2I Reviewer Role (v5.0 Addition)

Role

The Reviewer is a short-lived agent role spawned by an orchestrator (GO or PO) after a builder agent completes significant work. Its sole purpose is to independently evaluate the output against the original specification and return a verdict to the spawning orchestrator.

The Reviewer is not the builder. It is never the same agent that produced the work.

Why It Exists

Builders verify their own output via the Completion Self-Check Protocol (Section 5.2). That is insufficient for high-stakes deliverables. Self-evaluation has a structural blind spot: the builder shares the same context, assumptions, and potential misunderstandings that produced the work.

The Reviewer role closes this gap by introducing an independent perspective:

  • reads the spec and the output independently, with no shared build context
  • applies a different model tier where possible (Opus reviews Sonnet's work; Sonnet reviews Haiku's work)
  • cannot be influenced by the builder's reasoning path
  • reports a clean verdict with evidence

When to Spawn a Reviewer

A Reviewer should be spawned when:

  • a ticket is marked significant (architectural change, public API, schema migration, security-sensitive work)
  • the builder's Completion Self-Check confidence is in the 0.5–0.8 range
  • the orchestrator's policy for the project requires mandatory review
  • the builder explicitly requests independent review (rare but permitted)

A Reviewer is NOT spawned for:

  • routine sub-hour tasks
  • documentation edits without behavioral impact
  • tasks already reviewed by a human via RC

Spawning Rule

reviewer_spawn_rule:
  spawned_by: orchestrator (GO or PO)
  trigger: builder marks task complete on a significant ticket
  constraint: reviewer_agent_id != builder_agent_id
  model_preference:
    - if builder used sonnet → prefer opus for reviewer
    - if builder used opus → prefer sonnet for reviewer
    - if builder used haiku → prefer sonnet for reviewer
    - if preferred model unavailable → use any different model tier
  lifecycle: short-lived — spawned for one review, terminates after verdict
  bus_registration: yes — registered in Control Bus for traceability
  cost_attribution: separate ledger entry, attributed to the ticket

Reviewer Responsibilities

The Reviewer must:

  • read the original ticket specification (goal, scope, acceptance criteria)
  • read the builder's output (code, document, artifact, or result)
  • evaluate each acceptance criterion independently
  • identify gaps, regressions, or spec deviations
  • produce a structured verdict

The Reviewer must not:

  • fix the output itself
  • negotiate with the builder
  • consult the builder about intent
  • carry over context from a previous session on this ticket

Verdict Structure

review_verdict:
  ticket_id: string
  task_id: string
  reviewer_agent_id: string
  builder_agent_id: string
  reviewer_model: string
  builder_model: string
  verdict: APPROVED | NEEDS_FIX | REJECTED
  criteria_results:
    - criterion: string
      result: pass | fail | partial
      evidence: string
  gaps_found: [string]
  fix_required: [string]   # populated when verdict = NEEDS_FIX
  rejection_reason: string  # populated when verdict = REJECTED
  recommendation: string
Verdict Meanings
  • APPROVED — all acceptance criteria pass; orchestrator may close or promote the ticket
  • NEEDS_FIX — one or more criteria are partial or failing; orchestrator re-assigns to builder with the fix list
  • REJECTED — output does not meet the spec at a fundamental level; orchestrator decides whether to reassign or escalate

Orchestrator Response to Verdict

Verdict Orchestrator Action
APPROVED Mark task complete, proceed with ticket progression
NEEDS_FIX Re-open task, assign fix list to original builder, re-trigger review on completion
REJECTED Escalate to human (RC) or reassign entire task to a different agent

Relation to Completion Self-Check

The Completion Self-Check (Section 5.2) is the builder's internal gate. The Reviewer is the external gate.

Both must pass before a significant ticket is closed.

Builder self-check passes → task marked complete (builder)
                          → orchestrator spawns Reviewer
                          → Reviewer returns verdict
                          → APPROVED → ticket closed
                          → NEEDS_FIX / REJECTED → ticket re-opened

T1P Implementation Form

  • Reviewer is spawned as an on-demand agent (Pattern 2, Section 5A.2) with the reviewer role assigned
  • For T1P, review is triggered manually by the orchestrator after builder completion on significant tickets
  • Automated spawn-on-completion is a post-T1P enhancement
  • The review_verdict output is stored as an Activity Evaluation entity (Part 3, Section 4.6.1) attached to the reviewed task

Interaction with Other Components

Component Relation to Reviewer
Orchestrator role spawns Reviewer, receives verdict, acts on it
Builder (Specialist/Worker) produces the work being reviewed; cannot interact with active Reviewer
Completion Self-Check builder-side gate that precedes Review spawn
Activity Evaluation (Part 3) verdict stored as evaluation record
Governor role tracks review cost; may require review for high-cost tickets
RC receives REJECTED verdicts requiring human judgment

Final Rule

The Reviewer exists to catch what self-evaluation misses.

It is not a bureaucratic gate. It is a targeted quality signal for work that matters.


4.3 Ruflo — Agent Swarm Engine

Role

Agent orchestration runtime.

Responsibilities

  • spawn agents
  • manage sub-agents
  • control execution lifecycle
  • enforce boundaries

Notes

  • acts as execution fabric
  • integrates with Claude Code / other agents
  • supports multi-agent collaboration

4.4 LiteLLM — Model Router

Role

Provider abstraction layer.

Responsibilities

  • route requests to:
  • Claude
  • OpenAI
  • Gemini
  • local models
  • optimize cost vs performance
  • fallback handling
  • unify API interface

Key Feature

Enables provider independence


4.5 Execution Engine

Role

Actual execution runtime.

Can include:

  • Claude Code (primary)
  • RooCode
  • custom Python agents
  • CLI-based execution
  • future local models

4.6 Remote Control (RC)

4.6.1 Purpose

RC enables the human operator to interact with live execution in a controlled and auditable way.

RC exists to:

  • attach to a running execution context
  • respond to escalation requests
  • approve or reject protected actions
  • inject bounded guidance
  • redirect or constrain execution
  • recover decision continuity during ambiguity or failure

RC is the primary human interaction surface for live XIOPro brains.

It supports:

  • exploratory conversation
  • execution-bound discussion
  • approval and escalation handling
  • recovery intervention
  • bounded guidance and redirection

When RC interaction materially affects execution, it must be converted into durable operational state.

4.6.2 Principle

RC transforms XIOPro from:

autonomous system → governed autonomous system

The key principle is:

RC is not the system of record.

The system of record is the durable operational state held in:

  • Agent Runtime
  • Session
  • Escalation Request
  • Human Decision
  • Activity / Ticket / Task lineage
  • Transcript and context references

RC is the human interaction surface over those objects.


4.6.3 Canonical Objects Used by RC

RC must operate on the canonical runtime objects defined in the ODM.

Agent Runtime

Represents the live actor doing work.

RC may:

  • attach to it
  • pause it
  • constrain it
  • redirect it
  • resume it
Session

Represents the durable execution session.

RC must attach to a session, not merely to an abstract agent name.

A session may be:

  • active
  • idle
  • paused
  • waiting
  • blocked
  • crashed
  • recovering
  • closed
  • archived
Escalation Request

Represents a durable request for human discussion, clarification, or approval.

RC should open or respond to an escalation request rather than relying on ad hoc chat state.

Human Decision

Represents the durable answer or approval outcome recorded by the founder/operator.

RC is one way to create a Human Decision, but the decision must persist beyond the UI.

Execution Surface

Represents where the runtime is actually executing, such as:

  • Claude Code
  • Codex
  • Gemini CLI
  • custom CLI
  • API worker
  • future local model runtime

RC must be aware of execution surface constraints.


4.6.4 RC Interaction Modes

RC should support at least these modes:

Attach Mode

Used when the founder wants to connect to an already-running session.

Escalation Response Mode

Used when a task or runtime has opened a durable Escalation Request.

Approval Mode

Used when a protected action requires formal go / no-go input.

Redirect Mode

Used when the founder changes goal, scope, constraints, provider, or path.

Recovery Mode

Used when a session crashed, degraded, or became blocked and a recovery decision is needed.


4.6.5 RC Architecture

flowchart TD
    Human --> RCInterface
    RCInterface --> RCManager
    RCManager --> Session
    RCManager --> AgentRuntime
    RCManager --> EscalationRequest
    EscalationRequest --> HumanDecision
    HumanDecision --> RCManager
    RCManager --> Orchestrator["Orchestrator"]
    RCManager --> Governor["Governor"]
    Orchestrator --> ExecutionSurface
    ExecutionSurface --> Session
    Session --> TranscriptStore
    Session --> ContextBundle

4.6.6 RC Manager Responsibilities

RC Manager is the backend control layer for RC.

It must:

  • locate attachable sessions
  • bind human interaction to the correct runtime scope
  • assemble the required context bundle
  • persist interaction history
  • route decisions back into runtime execution
  • preserve ticket/task/activity lineage
  • support multi-brain switching
  • prevent uncontrolled cross-session contamination

It must not:

  • silently overwrite durable system state
  • bypass approval requirements
  • become a generic freeform chat relay without structure

4.6.7 Context Bundle Contract

Before attaching or escalating, RC should assemble a bounded context bundle.

Minimum bundle contents:

context_bundle:
  runtime_id: string
  session_id: string
  execution_surface_id: string|null

  ticket_id: string|null
  task_id: string|null
  activity_id: string|null

  current_goal: string|null
  current_state: string
  blocker_summary: string|null

  recent_actions_ref: string|null
  relevant_knowledge_refs: [string]
  transcript_ref: string|null
  checkpoint_ref: string|null

  escalation_request_id: string|null
  pending_approval: boolean
  recommended_next_step: string|null

This keeps human intervention compact, explicit, and resumable.


4.6.8 RC Triggers

RC may be invoked by:

  • requires_human = true
  • requires_approval = true
  • ambiguity detected
  • recovery tradeoff required
  • quality failure requires judgment
  • runtime blocked
  • founder manual intervention
  • governance escalation from the governor

4.6.9 RC Flow

flowchart TD
    RuntimeActive --> TriggerDetected
    TriggerDetected --> EscalationOrAttach
    EscalationOrAttach --> ContextBundleBuilt
    ContextBundleBuilt --> HumanInteraction
    HumanInteraction --> HumanDecisionRecorded
    HumanDecisionRecorded --> RuntimeResume
    RuntimeResume --> SessionUpdated
    SessionUpdated --> AuditTrail

4.6.10 RC Interaction Modes

Mode Purpose Durability Requirement
exploratory conversation think with a brain without immediate execution change optional unless promoted
execution-bound discussion guide or clarify active work must persist if it affects work
approval / escalation formal human gate durable by default
recovery intervention unblock or redirect after failure/degradation durable by default
Rule

RC is the unified human interaction surface for XIOPro brains.

Not every RC conversation must mutate execution state.

But any RC conversation that changes execution, constraints, approvals, direction, or recovery must be recorded as durable operational state.


4.6.11 RC Success Criteria

RC is successful when:

  • the founder can attach to the correct live execution context
  • discussion and approval become durable system state
  • context injection is bounded and traceable
  • session continuity is preserved after intervention
  • multiple brains can be switched without confusion
  • recovery decisions are captured and replayable

4.6.12 Final Statement

RC is not a convenience chat layer.

It is the controlled human intervention surface for live execution.


4.7 Session Manager

4.7.1 Role

Session Manager owns session lifecycle control.

It ensures that execution continuity survives:

  • normal pause/resume
  • human escalation
  • provider instability
  • runtime crash
  • surface switch
  • controlled recovery

4.7.2 Responsibilities

Session Manager must:

  • open and close sessions
  • monitor session health
  • persist session checkpoints
  • transfer or rebuild context
  • coordinate session recovery
  • track attachment eligibility
  • support resume semantics
  • preserve transcript references
  • prevent orphaned runtimes

4.7.3 Session State Model

Session Manager must honor the canonical session states:

  • active
  • idle
  • paused
  • waiting
  • blocked
  • crashed
  • recovering
  • closed
  • archived

Interpretation

  • active = currently executing
  • idle = no immediate work, resumable
  • paused = intentionally halted
  • waiting = waiting on human/dependency/event
  • blocked = cannot continue without intervention
  • crashed = abnormal interruption
  • recovering = recovery path underway
  • closed = ended and no longer active
  • archived = retained for history

4.7.4 Recovery Paths

Session recovery should support at least:

  • retry same session
  • resume with new session on same surface
  • switch execution surface
  • switch model/provider path
  • escalate to human for recovery decision
  • terminal close

Recovery must preserve lineage to:

  • runtime
  • ticket
  • task
  • activity
  • escalation request
  • human decision

4.7.5 Attachment Eligibility

A session is attachable when:

  • it has not been terminally closed
  • it is still relevant to live or recoverable work
  • required context is available
  • ownership/lock conditions permit intervention

Attachable states usually include:

  • active
  • idle
  • paused
  • waiting
  • recovering

4.7.6 Ownership & Locking

Session Manager should prevent unsafe simultaneous human/control collisions.

Minimum rules:

  • one human control attachment at a time
  • explicit lock release on detach or timeout
  • emergency override allowed with audit log
  • session ownership visible to the orchestrator/governor and the control surface

4.7.7 Session Success Criteria

Session management is successful when:

  • sessions survive ordinary interruptions
  • recoverable failures remain recoverable
  • no important context is silently lost
  • human intervention can resume the right work reliably
  • execution surfaces can be switched without breaking lineage

4.8 Memory / Context Layer

4.8.1 Role

This layer preserves the working context needed for execution continuity.

It is not identical to long-term knowledge storage.

It provides the bounded memory bridge between:

  • live runtime execution
  • ticket/task state
  • durable knowledge
  • human intervention
  • recovery

4.8.2 Context Horizons

Short-Term

Immediate live execution state:

  • current step
  • recent tool calls
  • latest outputs
  • transient working memory
Mid-Term

Execution continuity state:

  • ticket/task context
  • session checkpoints
  • escalation context
  • current constraints
  • recent decisions
Long-Term

Persistent knowledge:

  • rules
  • skills
  • activations
  • documents
  • reflections
  • prior decisions
  • knowledge graph / ledger references

4.8.3 Context Sources

The context layer may assemble context from:

  • Session transcript
  • checkpoint artifacts
  • task/activity state
  • Librarian / knowledge layer
  • Knowledge Ledger
  • human decisions
  • governance decisions
  • Dream-derived improvements where approved

4.8.4 Context Rules

The context layer must:

  • minimize irrelevant context
  • preserve critical continuity
  • avoid cross-ticket contamination
  • keep human intervention bounded
  • allow reconstruction after crash or restart

It must not:

  • blindly dump all history into runtime prompts
  • let one brain inherit another brain's state without justification
  • treat chat history as the only memory source

4.8.5 Resume Bundle

A resumed runtime should receive a structured resume bundle, not just raw transcript replay.

Minimum resume bundle:

resume_bundle:
  session_id: string
  runtime_id: string
  current_goal: string|null
  latest_valid_checkpoint_ref: string|null
  latest_human_decision_ref: string|null
  active_constraints: [string]
  relevant_knowledge_refs: [string]
  next_expected_action: string|null

4.8.6 Success Criteria

The context layer is successful when:

  • tasks resume without silent amnesia
  • human decisions remain attached to execution
  • context remains compact and relevant
  • session recovery is practical
  • long-term knowledge improves execution without polluting it

4.8A Memory Engineering Principles (from 5-Layer Memory Stack Research)

These 5 production engineering rules apply to all XIOPro memory operations (Hindsight, Librarian, state files, knowledge vault). Derived from @the_enterprise.ai's "5-Layer AI Agent Memory Stack" research (see struxio-knowledge/vault/research_inbox/REVIEW_5_layer_memory_stack_images.md for full analysis).

memory_engineering_principles:

  1_async_updates:
    rule: "Never block the main agent execution for memory operations"
    implementation: "Memory extraction, indexing, and storage happen in background threads or post-activity hooks"
    applies_to: [Hindsight, Librarian, Knowledge Ledger]

  2_debounce_writes:
    rule: "Batch memory operations  don't write on every turn"
    implementation: "Wait 30 seconds or N turns, batch messages, make one extraction call"
    applies_to: [Hindsight, session state, activity logging]
    benefit: "Reduces token usage and prevents memory thrashing"

  3_confidence_threshold:
    rule: "Discard low-confidence facts (< 0.7). Cap total facts per agent at 100"
    implementation: "Every stored fact gets a confidence score 0.0-1.0. Below threshold = discard. Above cap = trim lowest confidence."
    applies_to: [Knowledge Objects, Hindsight memories, agent lessons]
    benefit: "Prevents unbounded memory growth and low-quality knowledge accumulation"

  4_token_budget:
    rule: "Control context injection size  max 2000 tokens for memory context"
    implementation: "When injecting memories/knowledge into agent context, trim to budget by removing lowest-confidence items first"
    applies_to: [Prompt Steward context assembly, Hindsight auto-inject, RAG retrieval]
    benefit: "Prevents context window bloat, keeps agent focused"

  5_atomic_writes:
    rule: "Write state files atomically  temp file then rename"
    implementation: "Write to plan.yaml.tmp, then mv plan.yaml.tmp plan.yaml. Never corrupt state mid-write."
    applies_to: [plan.yaml, next-actions.yaml, agents.yaml, all state files, session checkpoints]
    benefit: "Prevents corrupted state from crashes or concurrent writes"

Relation to Existing XIOPro Architecture

Principle XIOPro Component Current Status T1P Action
1. Async Updates Bus async messaging, background workers Partially covered by Bus architecture Enforce for Hindsight/Librarian processing
2. Debounce Writes Hindsight extraction, session state Not yet implemented Add batching to Hindsight extraction pipeline
3. Confidence Threshold Knowledge Objects, Hindsight memories Not yet implemented — no confidence field exists Add confidence field to Knowledge Objects (Part 5, Section 6.1)
4. Token Budget Context Rules (Section 4.8.4), Prompt Steward Context rules say "minimize" but lack hard number Define 2000-token hard budget for memory context injection
5. Atomic Writes State files (plan.yaml, next-actions.yaml) Not enforced Implement write-to-temp-then-rename for all state files

Implementation Requirements

  • All agents must use atomic writes for state file mutations (Principle 5)
  • The Prompt Steward (Section 4.2B) must enforce the 2000-token memory context budget (Principle 4) when assembling prompt packages
  • Hindsight and Librarian processing must be async and debounced (Principles 1, 2) — see Part 5, Sections 9 and 4 for implementation requirements
  • Knowledge Objects must carry a confidence score field; the Librarian must enforce threshold and cap (Principle 3) — see Part 5, Section 6.1

4.9. Dream Engine (Sleep-Time Intelligence Layer)

The Dream Engine and its T1P subset (Idle Maintenance, Section 4.9.9) are part of the XIOPro Optimizer (see Part 1, Section 8A).

XIOPro includes a background cognition layer called the Dream Engine.

This system runs during idle periods and performs:

  • memory consolidation
  • knowledge pruning
  • contradiction resolution
  • pattern extraction
  • cost optimization suggestions
  • system-level improvements

4.9.1 Purpose

Prevent:

  • memory decay
  • knowledge fragmentation
  • context pollution
  • repeated mistakes

Enable:

  • long-term intelligence accumulation
  • system self-improvement
  • reduced token usage over time

4.9.2 Trigger Conditions

Dream cycles are triggered when:

  • time threshold reached (e.g. 24h)
  • activity threshold reached (e.g. N sessions / N tasks)
  • manual trigger by founder
  • major system change detected

4.9.3 Scope of Operation

Dream Engine operates on:

  • knowledge base (.md / .yaml / DB)
  • tickets history
  • task execution logs
  • reflections
  • agent performance data

4.9.4 Core Phases

  1. Orientation

  2. scan current system state

  3. build structural map
  4. Signal Extraction

  5. identify:

    • repeated failures
    • corrections
    • decisions
    • patterns
  6. Consolidation

  7. merge duplicates

  8. remove obsolete entries
  9. normalize metadata
  10. convert relative → absolute time
  11. Optimization

  12. propose:

    • rule updates
    • skill improvements
    • routing optimizations
    • cost reductions
  13. Index Rebuild

  14. update:

    • librarian index
    • topic structure
    • search mappings

4.9.5 Output Artifacts

Dream Engine produces:

  • updated knowledge files
  • improvement proposals
  • rule modification suggestions
  • skill enhancement suggestions
  • anomaly reports

4.9.6 Governance

  • runs in isolated mode
  • cannot modify execution directly
  • requires approval for:
  • rule changes
  • system behavior changes

4.9.7 Relation to Other Systems

System Role
Auto Memory capture
Librarian organize
Dream Engine refine
Reflection Engine evaluate
Improvement Engine apply

4.9.8 Strategic Impact

Dream Engine transforms XIOPro from:

"execution system"

into:

self-evolving intelligence system


4.9.9 T1P Dream Engine Posture (v5.0 Addition)

Posture: Idle Maintenance Only

The full Dream Engine architecture is preserved in this blueprint as the target capability.

For T1P, only the following subset is implemented:

  • Memory consolidation (AutoDream) -- consolidate session artifacts, clean transient state
  • Stale knowledge detection -- flag documents that have not been referenced or updated beyond a threshold
  • Morning brief generation -- produce a daily summary of system state, pending work, and alerts
  • Session cleanup -- archive completed sessions, remove orphaned runtime artifacts
  • Idea review -- scan Ideas with status new or deferred whose next_review_at has passed or whose last_reviewed_at exceeds the configured review cycle. Surface stale ideas in the morning brief for user attention.

The full Dream Engine phases (signal extraction, optimization proposals, index rebuild, contradiction resolution) are deferred to post-T1P.

Rule

T1P Dream Engine is operational but narrow. It maintains system hygiene without attempting autonomous intelligence evolution. Full capability is a post-T1P milestone.


4.10 Agent Activation Architecture (v5.0 Addition)

Problem

Current activation files (ACTIVATE_BM.md, ACTIVATE_B1.md, etc.) are 65-108 lines each and contain significant duplication:

Duplicated Content Lines Repeated In
Execution Discipline (Boris rules) 6-8 All 7 agents
Paperclip protocol 2-4 All agents
Session Start Protocol 5-6 All agents
Memory (Hindsight + Obsidian) 2-3 domain brains
Worker spawning rules 3-4 domain brains
First Action (read tools + state) 3-5 All agents

This wastes ~1,400 tokens per agent load. Over dozens of daily sessions across 7 agents, this is significant token waste — and worse, it creates maintenance burden (changing a rule means editing 7 files).

Solution: Skill-Based Activation

Extract duplicated content into shared skills. Activation files become slim identity-only documents that declare which skills to load.

Extracted Skills
Skill Content Loaded By
SKILL_bootstrap Read tools reference, state files, lessons. Set context. All agents
SKILL_execution_discipline Boris Cherny rules: plan first, subagents, self-improve, verify, circuit breaker, git discipline. All agents
SKILL_memory Hindsight bank setup, Obsidian query patterns, knowledge retrieval. All agents
SKILL_worker_spawn Worker naming ([brain_id][seq]), max 3 active, headless mode, supervision rules. domain brains
SKILL_paperclip_sync Already exists. Ticket checkout, comments, completion, cost reporting. All agents
SKILL_session_start Heartbeat, bus poll, state load, resume top action. All agents
Slim Activation File Pattern
---
title: "ACTIVATE: 002 — Engineering Brain"
agent: 002
version: "5.0.0"
skills_on_load: [bootstrap, execution-discipline, session-start]
skills_available: [paperclip-sync, memory, worker-spawn]
---

# 002 — Engineering Brain

## Identity
You are **002** (Engineering) — STRUXIO's Product Engineering Brain.
Ruflo worker on Hetzner under orchestrator coordination.

## Domain
Python, REST APIs, product integrations, domain-specific tooling.

## Workers
201 (coder), 202 (tester), 203 (code-reviewer). Max 3.

## Model
Sonnet 4.6 default. Opus when ticket specifies.

## On Activation
Load `skills_on_load` from frontmatter. Execute SKILL_bootstrap.
Other skills load on demand when triggered.

~20 lines instead of 68. Token savings: ~200 tokens per agent load.

Skill Loading Strategy
skills_on_load:     # Always loaded at session start. Critical for identity and bootstrap.
skills_available:   # Loaded on demand when the agent's task requires them.

This mirrors how Superpowers skills work — frontmatter declares what's available, runtime loads when needed.

Connection to Dream Engine / Idle Maintenance

This optimization IS what the Dream Engine does in practice:

  1. Review activation files, skill files, and rules for duplication
  2. Detect redundancy, drift, and token waste
  3. Propose consolidation (new shared skills, slim activation files)
  4. Report to founder / rule steward role for approval

Adding to the Idle Maintenance scope:

idle_maintenance_tasks:
  - memory_consolidation          # existing
  - stale_knowledge_detection     # existing
  - morning_brief                 # existing
  - session_cleanup               # existing
  - activation_optimization       # NEW: review activation files for duplication
  - skill_dedup_detection         # NEW: detect overlapping skills
  - token_waste_analysis          # NEW: estimate token savings from consolidation
  - idea_review                   # NEW: scan ideas not reviewed within their review_cycle
  - skill_performance_review      # NEW: compare internal skill metrics against alternatives (see Part 5 Section 8.9A)
  - skill_token_optimization      # NEW: identify skills with high token usage, suggest alternatives (see Part 5 Section 8.9A)

This is the bridge between "Idle Maintenance" (T1P) and "Dream Engine" (full capability) — practical optimization that proves the Dream concept without requiring the full autonomous intelligence layer.

Migration Plan

  1. Create the 4 new skills (bootstrap, execution-discipline, memory, worker-spawn)
  2. Update SKILL_REGISTRY.yaml with all skills
  3. Slim activation files (one at a time, test each)
  4. Add Idle Maintenance task to detect future drift

4.11 Skill Selection Architecture (v5.0 Addition)

When the orchestrator assigns a task, it must select which skills the agent loads. This prevents token waste (loading 48 skills when 3 are needed) and ensures model-appropriate skill assignment.

Problem

The Skill Registry (Part 5, Section 8.9) defines what skills exist. The Activation Architecture (Section 4.10) defines how activation files reference skills. Neither defines which skills to load for a given task assignment.

Without selection logic: - Every agent loads all skills it has access to (token waste) - Haiku agents receive skills that require deep reasoning (quality loss) - Task-irrelevant skills dilute the agent's context (precision loss)

Foundation: Role → Topic → Skill Binding Chain (v5.0.5 Clarification)

Skills bind to roles via topics, not directly to agent numbers. A role has multiple topics. A topic has multiple skills. Any agent assigned a role inherits all topic-skill bindings for that role.

role_topic_skill_chain:
  role: designer
  topics:
    - brand_identity
    - content_creation
    - visual_design
  skills_per_topic:
    brand_identity: [voice-dna-creator, brainstorming]
    content_creation: [content-research-writer, writing-plans]
    visual_design: [brainstorming]

  role: specialist_compliance
  topics:
    - iso_19650
    - bim_fidelity
    - cde_management
  skills_per_topic:
    iso_19650: [claude-deep-research, writing-plans]
    bim_fidelity: [systematic-debugging, verification-quality]
    cde_management: [writing-plans]

This binding chain is the structural foundation for the 3-step selection filter below. Step 1 resolves the role's topic-skill bindings; Steps 2 and 3 then narrow the result by task type and model tier.

Solution: 3-Step Skill Selection Filter

When the orchestrator assigns a task, it computes the skill set through three sequential filters. The final skill set is the intersection of all three.

Step 1 — Filter by Agent Role (via Topic-Skill Bindings)

Each role has a base skill set derived from its topic-skill bindings. An agent only considers skills permitted for its role.

agent_role_skills:
  orchestrator: [writing-plans, brainstorming, paperclip-sync, dispatching-parallel-agents]
  specialist: [writing-plans, TDD, systematic-debugging, code-review, brainstorming, paperclip-sync]
  worker: [TDD, verification-before-completion, paperclip-sync]
  reviewer: [code-review, receiving-code-review, verification-quality, systematic-debugging]
  interface: []  # UI agents have no reasoning skills
Step 2 — Filter by Task Type

Each task type declares which skills are relevant. Only skills that survived Step 1 AND appear in the task type list continue.

task_type_skills:
  coding: [TDD, systematic-debugging, verification-before-completion]
  research: [brainstorming, writing-plans]
  review: [code-review, receiving-code-review, verification-quality]
  design: [brainstorming, writing-plans]
  deployment: [verification-before-completion]
  debugging: [systematic-debugging]
  planning: [brainstorming, writing-plans, executing-plans]
  ticket_management: [paperclip-sync]
Step 3 — Filter by Model Tier

The assigned model determines final compatibility. Skills that require reasoning beyond the model's capability are excluded.

model_skill_compatibility:
  haiku:
    exclude: [brainstorming, writing-plans]  # too complex for haiku
    best_for: [paperclip-sync, verification-before-completion]  # simple execution
  sonnet:
    exclude: []  # handles everything
    best_for: [TDD, systematic-debugging, code-review]  # sweet spot
  opus:
    exclude: []
    best_for: [brainstorming, writing-plans, architecture, complex-debugging]  # deep reasoning
Selection Formula
final_skills = role_skills ∩ task_skills − model_excludes

Example: specialist + coding + sonnet: - Step 1 (role): [writing-plans, TDD, systematic-debugging, code-review, brainstorming, paperclip-sync] - Step 2 (task): [TDD, systematic-debugging, verification-before-completion] - Intersection: [TDD, systematic-debugging] - Step 3 (model): sonnet excludes nothing - Result: [TDD, systematic-debugging]

Known Skill Library (Categorized)

All skills managed by the rule steward role. Categories determine default model tier.

Execution Skills (any model)
Skill ID Purpose
paperclip-sync Ticket lifecycle management
verification-before-completion Verify before marking done
finishing-a-development-branch PR/merge workflow
using-git-worktrees Isolated feature work
Engineering Skills (Sonnet+)
Skill ID Purpose
test-driven-development TDD workflow
systematic-debugging Debug before fix
pair-programming AI pair programming
code-review Requesting code review
receiving-code-review Receiving code review
verification-quality Truth scoring
Architecture Skills (Sonnet/Opus)
Skill ID Purpose
brainstorming Explore before building
writing-plans Implementation plans
executing-plans Execute with checkpoints
subagent-driven-development Parallel execution
dispatching-parallel-agents Independent work dispatch
Infrastructure Skills (any model)
Skill ID Purpose
hooks-automation Hooks management
swarm-orchestration Multi-agent coordination
swarm-advanced Advanced swarm patterns
Knowledge Skills (Sonnet+)
Skill ID Purpose
writing-skills Create/edit skills
skill-builder Generate skill templates
reasoningbank-agentdb Adaptive learning
agentdb-memory-patterns Persistent memory
Domain Skills
Skill ID Purpose
sparc-methodology SPARC development workflow
claude-api Claude API / Anthropic SDK integration
github-code-review GitHub code review
github-workflow-automation GitHub Actions workflow automation
github-project-management Project board and sprint planning
github-release-management Release orchestration and versioning
github-multi-repo Multi-repository coordination
github-code-review-swarm Swarm-coordinated code review
flow-nexus-platform Flow Nexus authentication, sandboxes, apps
flow-nexus-swarm Cloud swarm deployment with Flow Nexus
flow-nexus-neural Neural network training in Flow Nexus
Advanced/Candidate Skills (Review Required)

These skills exist but require rule steward review before T1P adoption:

Skill ID Purpose Concern
v3-performance-optimization Aggressive performance targets claude-flow v3 specific
v3-mcp-optimization MCP server optimization claude-flow v3 specific
v3-cli-modernization CLI modernization claude-flow v3 specific
v3-ddd-architecture DDD architecture patterns claude-flow v3 specific
v3-core-implementation Core module implementation claude-flow v3 specific
v3-security-overhaul Security architecture overhaul claude-flow v3 specific
v3-memory-unification Memory system unification claude-flow v3 specific
v3-integration-deep Deep agentic-flow integration claude-flow v3 specific
v3-swarm-coordination 15-agent hierarchical coordination claude-flow v3 specific
agentdb-vector-search Semantic vector search Advanced, may be premature
agentdb-memory-patterns Persistent memory patterns Advanced, may be premature
agentdb-learning RL learning plugins Advanced, may be premature
agentdb-optimization Performance optimization Advanced, may be premature
agentdb-advanced Multi-DB management Advanced, may be premature
reasoningbank-intelligence Adaptive learning patterns Advanced, may be premature
stream-chain Stream-JSON chaining Niche use case
browser Web browser automation Overlaps with Playwright MCP
Full Skill Count Summary
Category Count Model Tier
Execution 4 any
Engineering 6 Sonnet+
Architecture 5 Sonnet/Opus
Infrastructure 3 any
Knowledge 4 Sonnet+
Domain 11 varies
Advanced/Candidate 17 varies
Total 50

The rule steward reviews this catalog during idle maintenance to detect unused skills, propose consolidation, and evaluate candidate skills for promotion or retirement.

The rule steward role reviews this list during idle maintenance to detect unused skills and propose consolidation.

Task Assignment with Skills

When the orchestrator assigns a task to an agent, the selection result is included in the assignment:

task_assignment:
  task_id: "1001"
  agent_id: "002"
  skills_required: [TDD, systematic-debugging]  # from selection logic
  skills_available: [verification-before-completion]  # on-demand if needed
  model: sonnet
  host: hetzner-cpx62

The skills_required field is computed by the 3-step filter. The skills_available field lists additional skills the agent may invoke on-demand (present in its role set but not in the task type set).

Connection to Rule Steward Role

The rule steward maintains the skill library and selection logic:

  • Reviews skill usage patterns across task assignments
  • Detects unused skills (no assignments in 30 days)
  • Proposes consolidation when skills overlap
  • Updates model compatibility as new models release
  • Adds new skills when gaps detected in task coverage
  • Adjusts role-skill mappings when new roles are introduced

This is governed by the same idle maintenance cycle defined in Section 4.9.9 and the Rule Steward responsibilities in Section 4.2A.

T1P Implementation

For T1P, skill selection is performed manually by the orchestrator when dispatching tasks: - The orchestrator reads the role + task type + model and picks skills accordingly - No automated selection engine required - The YAML definitions above serve as the reference lookup table

Full automation (selection engine integrated with Ruflo task dispatch) is deferred to post-T1P.


5. Execution Flow

flowchart TD
    Ticket --> Task
    Task --> Orchestrator["Orchestrator"]
    Orchestrator --> AssignAgent
    AssignAgent --> Execute
    Execute --> Activity
    Activity --> Evaluate
    Evaluate --> Continue

5.1 Task Dependency Resolution (v5.0.8 Addition)

Tasks can have dependencies (depends_on, blocks). The orchestrator must resolve these before assignment.

dependency_resolution:
  algorithm: "topological_sort"
  rules:
    - task cannot start until all depends_on tasks are completed
    - if circular dependency detected: flag as error, escalate to user
    - parallel execution: tasks with no shared dependencies run simultaneously
    - blocked tasks: re-evaluate when any dependency completes

  execution_order:
    1. build dependency graph from all active tasks
    2. topological sort to determine execution order
    3. identify tasks with zero dependencies (ready now)
    4. assign ready tasks to available agents (respecting host capacity)
    5. when task completes: remove from graph, re-check dependents
    6. repeat until all tasks complete or blocked

Design Rationale

  • Topological sort is the minimal correct algorithm for DAG resolution. It guarantees no task starts before its dependencies complete, and it detects circular dependencies (which are errors by definition).
  • Parallel execution is implicit: any tasks with zero unresolved dependencies at the same time can run simultaneously, bounded by host capacity (see Section 4.2F).
  • Re-evaluation on completion means the orchestrator does not need to pre-compute the full schedule. It reacts to task completion events and releases newly-unblocked tasks.

T1P Implementation

For T1P, the orchestrator performs dependency resolution manually:

  • Read task depends_on and blocks fields from the ODM (Part 3, Section 4.5)
  • Build a simple in-memory dependency graph
  • Assign tasks in topological order
  • If the graph is small (< 50 tasks per project), no external DAG engine is needed

A formal DAG execution engine (e.g., integrated into Ruflo) is deferred to post-T1P when project complexity may require it.


5.2 Completion Self-Check Protocol (v5.0.8 Addition)

Before an agent claims a task is complete, it must run a self-evaluation. This strengthens the Reflection pattern (Part 1, Section 7) from post-hoc to in-line.

completion_self_check:
  steps:
    1. re_read_objective: "Read the task objective again"
    2. check_acceptance_criteria: "For each criterion, verify it is met"
    3. run_completion_test: "Execute the completion_test command if defined"
    4. self_score: "Rate confidence 0.0-1.0 that the task is truly done"
    5. identify_gaps: "List anything that might be incomplete"
    6. decision:
        - if confidence >= 0.8 and completion_test passes: mark done
        - if confidence 0.5-0.8: mark done with caveats noted
        - if confidence < 0.5: do NOT mark done, continue working or escalate

  output:
    completion_evaluation:
      task_id: string
      confidence: float
      criteria_met: [string]
      criteria_unmet: [string]
      completion_test_result: pass|fail|not_defined
      gaps_identified: [string]
      decision: done|continue|escalate

Design Rationale

  • Re-reading the objective counters drift: agents can lose track of the original goal during long execution sequences.
  • Acceptance criteria check is explicit: each criterion from the task definition (Part 3, Section 4.5) must be individually verified, not assumed.
  • Confidence scoring introduces nuance: not all completions are equal. A task marked "done with caveats" signals to the orchestrator that review may be warranted.
  • Escalation at low confidence prevents agents from marking tasks done when they know they fell short. This is cheaper than discovering incomplete work downstream.

Relation to Activity Evaluation

The completion_evaluation output becomes an Activity Evaluation entity (Part 3, Section 4.6.1) attached to the final activity of the task. This makes self-evaluation auditable and queryable.

T1P Implementation

For T1P, the self-check is enforced via activation files:

  • Every agent activation includes the completion self-check protocol as a rule
  • The orchestrator verifies that task completion messages include the completion_evaluation block
  • Tasks marked done without evaluation are flagged for review

5.3 Agent Auto-Pickup (v5.0.13 Addition)

Agents signal readiness and self-retrieve their next task rather than waiting passively for orchestrator push. This reduces orchestrator polling overhead and allows agents to resume immediately after completing a task.

agent_auto_pickup:
  endpoint: POST /agents/{id}/pickup
  behavior:
    - Agent calls pickup when it becomes idle (task complete or session start)
    - Bus evaluates assigned tasks for the agent, returns highest-priority ready task
    - If no task is ready: returns 204 No Content — agent polls again after backoff
  task_query_endpoint: GET /agents/{id}/tasks
  backoff_schedule: [5s, 10s, 30s, 60s]

Why Auto-Pickup

  • Orchestrator pushes tasks via Bus when assigning, but agents may miss push on session restart
  • Auto-pickup ensures no assigned task is silently dropped on session recovery
  • Pair with SSE: agent receives push notification AND can self-poll on reconnect

5.4 Paperclip Auto-Sync (v5.0.13 Addition)

Paperclip task records are kept in sync with XIOPro ODM task state via fire-and-forget async calls. Agents do not wait for Paperclip acknowledgement.

paperclip_auto_sync:
  trigger: any task CRUD operation (create, update, complete, block)
  pattern: fire-and-forget
  behavior:
    - Task state change occurs in XIOPro ODM (source of truth)
    - Async call to Paperclip API issued in background
    - Failure is logged but does not block execution
    - Sync catches up on next successful call
  note: Paperclip is the current task tracker (to be superseded by XIOPro ODM)

5A. Agent Spawning Patterns (v5.0 Addition)

XIOPro distinguishes three spawning patterns. Each serves a different purpose and has different lifecycle, visibility, and cost characteristics.

5A.1 Agent vs Sub-Agent Distinction

Property Agent Sub-Agent
Identity Own 3-digit ID (e.g., 002) No ID — lives under parent
Bus registration Registered, sends heartbeats NOT registered in Bus
Session Own independent session Shares parent's session context
Memory Own Hindsight bank Uses parent's memory
Lifecycle Survives parent restart Dies with parent session
Orchestrator visibility Visible — orchestrator can intervene Invisible — parent's responsibility
Communication Through Control Bus (SSE, REST) Direct to parent via Ruflo
Cost tracking Own cost ledger entries Rolled into parent's cost
Model Configured per agent Usually Haiku (cheap)
Capacity Counts against host limit Max 3 per parent

5A.2 Three Spawning Patterns

Pattern 1: Project Roster Agent (Commissioned)

Spawned when a project starts. Long-lived. Assigned to project roster.

pattern: project_roster
spawned_by: orchestrator or system master
when: "Project needs sustained domain expertise"
duration: entire project or sprint
lifecycle:
  - orchestrator creates agent
  - registers in Control Bus
  - added to project roster with roles
  - works on project tickets
  - freed when project completes or no longer needed
examples:
  - "A product project needs a compliance specialist for 2 weeks"
  - "XIOPro needs a dedicated backend engineer"
visibility: full — orchestrator sees status, cost, tasks

Pattern 2: On-Demand Agent (Task-Scoped)

Spawned for a specific task. Medium-lived. Has own identity.

pattern: on_demand
spawned_by: orchestrator
when: "A specific task needs a dedicated agent"
duration: task duration (hours to days)
lifecycle:
  - orchestrator identifies task needing dedicated agent
  - checks host capacity
  - spawns agent with task assignment
  - agent registers in Control Bus
  - works on assigned task
  - reports results
  - terminated when task complete
examples:
  - "Research all competitors in the target domain  spawn a research agent"
  - "Build the SSE endpoint  spawn a backend agent"
  - "Run security audit  spawn a security agent"
visibility: full — registered, trackable, cost-attributed

Pattern 3: Sub-Agent (Ephemeral, Parent-Managed)

Spawned WITHIN an agent's session for parallel subtasks. Short-lived. No independent identity.

pattern: sub_agent
spawned_by: parent agent (via Ruflo claude -p)
when: "Agent needs parallel help within its own work"
duration: minutes to hours, within parent session
lifecycle:
  - parent agent decides it needs parallel help
  - spawns sub-agent via Ruflo (claude -p headless)
  - sub-agent executes narrow task
  - reports result directly to parent
  - parent reviews and integrates
  - sub-agent terminates
  - cost rolls into parent's ledger
examples:
  - "I'm coding and need tests run in parallel"
  - "I need a quick code review of my current diff"
  - "Fetch and summarize 5 web pages while I continue"
  - "Run the DDL migration while I update the docs"
max_concurrent: 3 per parent agent
model: typically Haiku (cheapest capable model)
visibility: invisible to orchestrator — parent's responsibility

5A.3 Spawning Decision Logic

When the orchestrator receives a task:

flowchart TD
    Task["New Task"] --> NeedAgent{"Need a new agent?"}
    NeedAgent -->|"No - existing agent available"| Assign["Assign to existing agent"]
    NeedAgent -->|"Yes"| Duration{"Expected duration?"}
    Duration -->|"Sprint/project"| Roster["Spawn Project Roster Agent"]
    Duration -->|"Days"| OnDemand["Spawn On-Demand Agent"]
    Duration -->|"Hours or less"| Parent{"Can existing agent sub-agent it?"}
    Parent -->|"Yes"| SubAgent["Parent spawns Sub-Agent"]
    Parent -->|"No"| OnDemand
    Roster --> Register["Register in Control Bus"]
    OnDemand --> Register
    SubAgent --> ParentManages["Parent manages internally"]

5A.4 Rules

  • Only the orchestrator spawns agents (roster and on-demand). Agents spawn sub-agents.
  • Agents count against host capacity. Sub-agents count against parent's sub-agent limit (max 3).
  • Sub-agents should NEVER be used for work that needs to survive a session restart. Use on-demand agents for that.
  • Cost attribution: agents get their own ledger entries; sub-agent costs roll into parent.
  • The orchestrator cannot see or intervene on sub-agents. If a sub-agent is stuck, the parent agent handles it or escalates.

6. Agent Lifecycle

flowchart TD
    Spawn --> Initialize
    Initialize --> Execute
    Execute --> Complete
    Execute --> Fail
    Fail --> Retry
    Retry --> Execute
    Complete --> Terminate

6.1 Heartbeat, Staleness & Orphan Cleanup

Heartbeat Intervals

Surface Heartbeat Interval Protocol
Agents (registered via Bus) 60 seconds POST /agents/{id}/heartbeat
SSE clients 30 seconds SSE :ping frame or heartbeat event

Staleness Thresholds

Threshold Duration Agent Status Action
Healthy < 300 seconds since last heartbeat online Normal operation
Stale 300 seconds (5 minutes) stale Governor emits agent.stale warning; no task reassignment yet
Dead 600 seconds (10 minutes) offline Agent marked offline; orphaned tasks reassigned

Orphan Cleanup

The Governor runs a cleanup sweep every 60 seconds:

  1. Query all agents where last_heartbeat_at < NOW() - INTERVAL '300 seconds' and status = 'online' -- mark as stale
  2. Query all agents where last_heartbeat_at < NOW() - INTERVAL '600 seconds' and status IN ('online', 'stale') -- mark as offline
  3. For agents marked offline: find all tasks with assigned_agent_id = {agent_id} and status = 'in_progress' -- reset to queued for reassignment
  4. Emit agent.offline governance event with agent_id, last_heartbeat_at, and count of reassigned tasks
  5. SSE connections that miss 3 consecutive pings (90 seconds) are closed server-side

Rules

  • An agent recovering from stale to online must re-register via POST /agents/register and reclaim its queued tasks
  • An agent recovering from offline must re-register; previously reassigned tasks are NOT automatically returned
  • The Governor must not mark the master orchestrator (GO) as offline without emitting a critical alert

7. Session Lifecycle

flowchart TD
    Start --> Active
    Active --> Idle
    Idle --> Resume
    Idle --> Dream
    Active --> Crash
    Crash --> Recover
    Recover --> Active

7.1 Context Rotation Protocol

The Global Orchestrator (GO) runs in long sessions that accumulate context. When context approaches capacity, GO must rotate to a fresh session without losing state.

Protocol Steps

context_rotation:
  trigger: "Session duration > 8 hours OR context feels compressed OR many agents spawned"

  steps:
    1. save_state:
      - Update Part 11 (Execution Log) with current session work
      - Update memory files (~/.claude/projects/*/memory/)
      - Push all repos to Git

    2. prepare_handoff:
      - Launch background restart process: nohup bash -c "sleep 5 && devxio go" &
      - OR spawn a rotation agent to manage the restart

    3. exit_session:
      - /exit (or session ends naturally)

    4. new_session_boots:
      - Reads CLAUDE.md (activation protocol)
      - Reads memory files (current project state)
      - Reads Part 11 (what was done, what's pending)
      - Reads plan.yaml (ticket status)
      - Resumes from exact point

  continuity:
    - Agent identity persists (GO = Global Orchestrator, same role)
    - State files are the bridge between sessions
    - No work is lost — everything is in Git + memory + Part 11

  frequency: "As needed. Typically once per 8-12 hour session."

Rule

Context rotation is transparent to the user. The orchestrator self-manages it. The user always talks to the same role (GO), just with fresh context.


8. Model Selection Strategy

Inputs

  • task complexity
  • required reasoning
  • cost constraints
  • latency requirements

Examples

Scenario Model
heavy reasoning Claude Opus
balanced Claude Sonnet
cheap execution GPT / Gemini
bulk tasks cheaper models

9. Cost Optimization Layer

Managed by the Governor

Strategies:

  • downgrade models when possible
  • batch operations
  • avoid redundant work
  • detect runaway loops
  • enforce budget limits

10. Failure Handling

Types

  • agent failure
  • model failure
  • session crash
  • incomplete task

Handling

  • retry
  • escalate
  • switch model
  • request human input

11. Human-in-the-Loop

Trigger Conditions

  • requires_human = true
  • requires_approval = true
  • ambiguity detected

Flow

  • pause execution
  • open RC
  • await input
  • resume

12. Multi-Agent Coordination

  • The orchestrator delegates to domain brains
  • Domain brains spawn workers
  • Results flow upward
  • The orchestrator maintains global consistency

12.1 Agent-to-Agent Communication Protocol

Message Format

All agent-to-agent communication uses JSON messages delivered through the Control Bus bus_send_message tool:

agent_message:
  id: uuid                    # unique message ID
  from_actor: string          # sender agent_id (e.g., "000")
  to_actor: string            # recipient agent_id or topic
  topic: string               # message topic / channel
  type: enum                  # task_assignment | result | query | notification | coordination
  payload: jsonb              # message-specific structured content
  idempotency_key: string     # unique key for deduplication
  correlation_id: string|null # links related messages in a conversation
  created_at: datetime

Delivery Guarantee

  • At-least-once delivery: the Bus persists all events to PostgreSQL. Agents poll with a cursor. Messages are never silently dropped.
  • If an agent is offline, messages accumulate in the Bus and are delivered when the agent next polls or reconnects via SSE.

Ordering

  • Per-topic ordering guaranteed: messages within a single topic are assigned sequential seq numbers by the Bus. Agents process messages in seq order.
  • Cross-topic ordering is NOT guaranteed. Agents must not depend on message ordering across different topics.

Retry Behavior

  • If a message delivery fails (agent unreachable, processing error), the sender retries up to 3 times with exponential backoff: 5s, 15s, 45s.
  • After 3 failed retries, the message is marked delivery_failed and a governance event message.delivery_failed is emitted.
  • The Bus itself does not retry -- retry is the sender's responsibility.

Idempotency

  • Every message must carry an idempotency_key (typically {from_actor}:{correlation_id}:{seq} or a UUID).
  • Recipients must deduplicate incoming messages by idempotency_key. Processing the same key twice must produce no side effects.
  • The Bus MAY deduplicate at ingestion if the same idempotency_key is submitted within a 5-minute window.

Acknowledgement

  • Agents acknowledge processed messages via bus_ack with their cursor position.
  • Unacknowledged messages are re-delivered on the next poll.

13. CLI-First Principle

Everything must be runnable:

  • via CLI
  • headless
  • without UI

UI is optional, execution is not.


14. Infrastructure Execution Mapping

Cloud (Hetzner)

  • Orchestrator (BrainMaster)
  • DB
  • Ruflo
  • LiteLLM
  • API services

Local (Mac Studio)

  • fallback execution
  • knowledge access
  • RC sessions
  • future local models

15. Restart & Recovery

System must support:

  • full restart command
  • state reload
  • session recovery
  • task continuation

16. GitHub & Backup Integration

GitHub

  • version control
  • agent code
  • rules
  • blueprints

Backblaze B2

  • backups
  • snapshots
  • disaster recovery

17. Observability

System must expose:

  • active agents
  • running tasks
  • cost
  • errors
  • alerts

18. Security & Isolation

  • API keys protected
  • agent isolation
  • permission control
  • environment separation

19. Current State (v5.0 Addition)

19.1 Agent Identity Model

All agents use 3-digit numeric IDs. Roles are assigned properties, not fixed identities. See Part 1, Section 8 for the complete identity schema.

Agent-to-role assignments are project-scoped (see Part 3, Section 4.2.1 Project Agent Roster). The architecture does not hardcode which agent holds which role — that is an operational decision made per project.

ID Allocation Ranges

Range Purpose
000-009 Core agents (orchestrators, specialists)
010-019 Remote/external host agents
020-029 Interface agents
030-099 Reserved for future core expansion
100-999 On-demand and ephemeral agents

Current Agent Registry (operational, not architectural)

The current agent registry is maintained in the Control Bus and the agents.yaml state file. It changes as projects are created and agents are commissioned or retired. See Part 11 (Execution Log) for the live registry.

19.2 Current Execution Runtime

  • Agent spawning: Ruflo (claude-flow) on Hetzner CPX62
  • Primary execution surface: Claude Code CLI on Hetzner
  • Remote execution: via Tailscale to Mac Studio
  • Agent communication: STRUXIO Bus (PostgreSQL-backed, evolving to Control Bus)
  • Task tracking: Paperclip (to be superseded by XIOPro ODM)
  • Agent identity: Unified 3-digit numbering, role-based assignment

20. Execution Success Criteria

Execution layer is successful if:

  • tasks complete reliably
  • sessions recover automatically
  • cost is controlled
  • agents remain coordinated
  • system runs 24/7

21. Final Statement

This layer is the engine of XIOPro .

If this is strong:

  • the system works continuously
  • the founder scales beyond time

If weak:

  • everything collapses into manual work

21. Error Handling Implementation Specification

This section closes the error handling gap identified by all three external reviewers. Part 7 defines governance policy objects and breaker types. This section specifies the concrete implementation parameters.

21.1 Retry Policy

retry_policy:
  default_max_retries: 3
  backoff: "exponential (1s, 2s, 4s)"
  max_backoff: 30s

21.2 Circuit Breaker Implementation Parameters

circuit_breakers:
  cost_breaker:
    threshold: "85% of budget_cap triggers warning, 100% halts non-critical agents"
    evaluation_frequency: "per-activity"
  loop_breaker:
    threshold: "same error 3 times in sequence"
    action: "halt agent, escalate to orchestrator"
  failure_breaker:
    threshold: "5 failed activities in 1 hour"
    action: "pause agent, alert user"
  memory_breaker:
    threshold: "host RAM at 85% = no new agents, 90% = graceful shutdown lowest priority, 95% = emergency terminate"
    evaluation_frequency: "every 60 seconds via host monitor"

21.3 Bus Down Fallback

bus_down_fallback:
  detection: "3 consecutive failed heartbeats (3 minutes)"
  agent_behavior: "continue current task locally, queue messages for retry"
  recovery: "on Bus recovery, flush queued messages, re-register"

21.4 Runaway Detection

runaway_detection:
  definition: "agent consuming >10x normal tokens for task type, or >30 minutes on a task estimated at <5 minutes"
  action: "governor alerts user, pauses agent if no response in 5 minutes"

21.5 Cross-Reference

These parameters implement the breaker types defined in Part 7, Section 9.3 and the recovery policies in Part 7, Section 8.4. The memory breaker implements the memory pressure survival rule from Part 8, Section 11.10.3.


Changelog

Version Date Author Changes
4.1.0 2026-03-26 BM Initial v4.1 release
4.2.0 2026-03-28 BM Added: T1P implementation form table (4.2E). Added: Ruflo relationship to O00 clarification (4.2F). Fixed: "Rufio" renamed to "Ruflo" globally. Added: Dream Engine T1P posture -- Idle Maintenance only (4.9.9). Added: Current agent mapping table (19.1). Added: Current execution runtime state (19.2). Added: Changelog section. Updated version header to 4.2.0.
4.2.1 2026-03-28 BM Unified Agent Identity Model: Reframed O00/O01/R01/P01/M01 as role bundles assigned to agents, not separate agent identities. Updated all section headers from profession codes to role names (e.g., "4.1 Orchestrator Role" instead of "4.1 O00"). Updated 4.2E table to show role bundles with agent 000 assignment. Updated 4.2F/4.2G/4.2H to use 3-digit agent IDs. Updated Section 19 agent mapping to unified 3-digit identity table with Old ID column. Updated all body text references from O00/O01/R01/P01/M01 to role-based naming. Updated all Mermaid diagrams to use 3-digit agent IDs.
4.2.2 2026-03-28 000 Agent naming migration: B1-B5 replaced with 001-005 in skill tables and activation examples. BM replaced with 000. W21-W23 replaced with 201-203 in example activation. Slim activation example updated from B2 to 002 naming. Backblaze B2 references preserved unchanged. Changelog author entries preserved as historical.
4.2.3 2026-03-28 000 Idea + User entities: Added idea_review to idle_maintenance_tasks (Section 4.10). Added Idea review to T1P Dream Engine posture (Section 4.9.9) — scan ideas not reviewed within review_cycle, surface in morning brief.
4.2.4 2026-03-28 000 Skill Selection Architecture (Section 4.11): 3-step filter (role + task type + model tier) for selecting which skills an agent loads per task assignment. Includes categorized skill library, task assignment contract with skills_required/skills_available fields, and rule steward governance connection. Updated Part 5 Section 8.9 to cross-reference.
4.2.5 2026-03-28 000 Founder clarifications: (1) Role-Topic-Skill binding chain added to Section 4.11 -- skills bind to roles via topics, not directly to agent numbers. (2) Added skill_performance_review and skill_token_optimization to idle_maintenance_tasks (Section 4.9.9).
4.2.6 2026-03-28 000 Roles over numbers: Removed agent IDs from all architectural role descriptions, section headers, diagrams, and tables. Agent numbers retained only in Section 19 (Current State) and Changelog. Blueprint now describes WHAT roles do, not WHICH agent holds them.
4.2.7 2026-03-28 BM XIOPro Optimizer cross-references: Added "part of the XIOPro Optimizer (see Part 1, Section 8A)" note to Governor (4.2), Rule Steward (4.2A), Prompt Steward (4.2B), Module Steward (4.2C), and Dream Engine (4.9).
4.2.8 2026-03-28 BM AGI pattern gap fixes: (1) Task Dependency Resolution (5.1) — topological sort DAG algorithm for depends_on/blocks resolution. Addresses audit gap "Workflow DAG Formalization" (Principle 21). (2) Completion Self-Check Protocol (5.2) — 5-step self-evaluation before marking tasks done, with confidence scoring and escalation rules. Addresses audit gap "Agent Self-Evaluation" (Principle 1 depth).
4.2.9 2026-03-28 000 Wave 1-2 BP fixes: Expanded Domain Skills in Section 4.11 — github- (6 skills), flow-nexus- (3 skills) now listed individually. Added Advanced/Candidate Skills table (17 skills) with review concerns. Added Full Skill Count Summary table (50 total skills across 7 categories).
4.2.10 2026-03-28 000 Memory engineering principles: Added Section 4.8A — 5 production engineering rules for memory operations (async updates, debounce writes, confidence threshold, token budget, atomic writes) from 5-Layer Memory Stack research. Includes relation table to existing architecture and implementation requirements. Slimmed Section 4.2H Control Bus to cross-reference Part 2 Section 5.8 (removed duplicated capabilities list).
4.2.11 2026-03-29 BM Added Section 4.1A (Orchestrator Surface Names) — GO/MO naming convention for Hetzner and Mac orchestrator surfaces with launch commands and rules. Added Section 7.1 (Context Rotation Protocol) — session rotation procedure for long-running orchestrator sessions with state preservation via Part 11, memory files, and Git.
4.2.12 2026-03-29 BM Cross-references: Added pointer to resources/DESIGN_rc_architecture.md (RC architecture — human-agent interaction surface design, Open WebUI evaluation, multi-provider routing).
4.2.13 2026-03-29 000 Batch BP update from recent tickets: Added Section 5.3 (Agent Auto-Pickup) — /agents/{id}/pickup endpoint, self-retrieval pattern, backoff schedule. Added Section 5.4 (Paperclip Auto-Sync) — fire-and-forget pattern for ODM-to-Paperclip sync on task CRUD.
4.2.14 2026-03-30 000 Reviewer role: Added Section 4.2I (Reviewer Role) — formal agent role for post-build independent review. Spawned by GO/PO after builder completes significant work; must be a different agent than the builder; uses different model tier where possible (Opus reviews Sonnet, Sonnet reviews Haiku); reads spec + output independently; returns APPROVED / NEEDS_FIX / REJECTED verdict to orchestrator; short-lived. Updated 4.2D.8 state ownership table, 4.2E T1P form table, and Section 4.11 agent_role_skills to include reviewer.
5.0.1 2026-03-30 GO I4: Added Section 6.1 (Heartbeat, Staleness & Orphan Cleanup) -- heartbeat intervals (60s agents, 30s SSE), stale threshold (300s), dead threshold (600s), Governor cleanup sweep every 60s, orphaned task reassignment to queued. I7: Added Section 12.1 (Agent-to-Agent Communication Protocol) -- JSON message format via Bus, at-least-once delivery, per-topic ordering with sequential seq numbers, 3-retry exponential backoff, idempotency_key deduplication, bus_ack acknowledgement.