XIOPro Production Blueprint v5.0¶
Part 4 — Execution & Agent System¶
1. Purpose of This Part¶
This document defines how XIOPro:
- executes work
- runs agents
- manages sessions
- integrates LLM providers
- supports Remote Control (RC)
- ensures continuity and recovery
- optimizes cost and performance
This is the layer that connects:
ODM (Part 3) → Real execution in the world
2. Execution Philosophy¶
XIOPro execution is:
- agent-driven
- ticket-based
- state-controlled
- cost-aware
- provider-agnostic
- continuously improving
Execution must:
- never depend on a single session
- survive crashes
- resume from state
- remain observable
3. Execution Stack Overview¶
flowchart TD
Ticket --> Task
Task --> A000
Orchestrator["Orchestrator"] --> AgentSelection
AgentSelection --> ModelRouter
ModelRouter --> ExecutionEngine
ExecutionEngine --> Activity
Activity --> DB
DB --> Governor["Governor"]
4. Core Components¶
4.1 Orchestrator Role¶
Formerly: O00 — Orchestrator
Role¶
Primary execution coordinator. In the unified agent identity model, the orchestrator role is one of several role bundles that can be assigned to an agent. See Part 1, Section 8 for the complete role bundle and agent identity definitions.
Responsibilities¶
- read work graph
- assign tasks
- select agents
- trigger execution
- handle failures
- maintain continuity
4.1A Orchestrator Surface Names¶
XIOPro uses named orchestrator surfaces for easy identification:
| Name | Full Name | Host | Launch Command | Role |
|---|---|---|---|---|
| GO | Global Orchestrator | Hetzner | devxio go or GO |
Primary orchestrator. Runs 24x7. Manages all projects, agents, state. |
| MO | Mac Orchestrator | Mac Studio | devxio mo or MO |
Mac-local orchestrator. Handles Mac tasks, browser testing, local experiments. |
Rules¶
- GO and MO are surface names, not agent IDs. The agent running GO might be 000 or any agent with the orchestrator role.
- Both launch via the
devxiocommand with the surface as argument. - GO is the primary -- MO reports to GO via the Control Bus.
- Both can run simultaneously on different hosts.
4.2 Governor Role¶
Formerly: O01 — Governor
This role is part of the XIOPro Optimizer (see Part 1, Section 8A).
Role¶
System optimization and protection. In the unified agent identity model, the governor role is a role bundle that can be assigned alongside the orchestrator role.
Responsibilities¶
- cost tracking
- anomaly detection
- performance analysis
- optimization recommendations
- circuit breaking
4.2A Rule Steward Role¶
Formerly: R01 — Rule & Skill Steward
This role is part of the XIOPro Optimizer (see Part 1, Section 8A).
Role¶
Role bundle responsible for the lifecycle quality of:
RULE_*assetsSKILL_*assets- agent activation assets such as
claude.md - reusable operating patterns / templates
The rule steward role is not a runtime governor like the governor role. It is the steward of behavioral assets that shape how XIOPro thinks and executes.
Why It Exists¶
As XIOPro evolves, the system will continuously accumulate:
- new skills
- revised rules
- agent-specific activations
- overlapping procedures
- obsolete guidance
- conflicting operating patterns
Without a dedicated steward, these assets drift, duplicate, and eventually degrade execution quality.
The rule steward role exists to keep the rule/skill layer:
- coherent
- reusable
- discoverable
- conflict-minimized
- approval-governed
Primary Responsibilities¶
The rule steward must:
- search for existing rules/skills before new ones are created
- detect missing capabilities and propose new skill creation
- validate structure, metadata, and completeness of rule/skill assets
- detect overlap, contradiction, duplication, and drift
- evaluate whether an activation file like
claude.mdremains effective - propose consolidation, supersession, deprecation, or promotion
- draft new skills using existing approved skills when appropriate
- open approval flows for protected changes
- maintain lineage across revisions
Non-Responsibilities¶
The rule steward must not:
- silently change live execution behavior
- bypass founder approval for protected changes
- replace governor runtime governance
- become uncontrolled self-modification
- commit rule/skill mutations directly into production without policy
Managed Asset Classes¶
Operating Modes¶
Core Inputs¶
The rule steward should consume:
- current
RULE_*files - current
SKILL_*files - activation files such as
claude.md - historical incidents and overrides
- task/result/reflection history
- Dream Engine proposals
- founder/operator requests
- performance and reuse signals
Core Outputs¶
The rule steward may emit:
- validation report
- conflict report
- redundancy report
- skill-gap report
- draft skill proposal
- draft rule proposal
- activation improvement proposal
- deprecation recommendation
- approval request
Technology Model¶
For T1P, rule and skill stewardship should use a dual representation:
-
Human-readable source of truth
-
Markdown assets in Git
- explicit metadata/front matter
- examples, rationale, and scope
-
Structured runtime mirror
-
normalized YAML/DB representation
- queryable scope, precedence, owner, status, and approval requirements
- machine-evaluable validation state
The rule steward operates across both layers.
Stewardship Flow¶
flowchart TD
NeedOrProposal --> SearchExisting
SearchExisting -->|found reusable asset| EvaluateFit
SearchExisting -->|gap detected| DraftNewAsset
EvaluateFit --> ValidateAsset
DraftNewAsset --> ValidateAsset
ValidateAsset --> DetectConflicts
DetectConflicts --> ApprovalGate
ApprovalGate --> PublishApprovedAsset
PublishApprovedAsset --> UpdateIndex
UpdateIndex --> AvailableForAgents
Relation to Other Components¶
| Component | Relation to Rule Steward |
|---|---|
| Orchestrator role | consumes approved rules/skills during execution |
| Governor role | governs runtime behavior using approved policy/rule outputs |
| Librarian | stores, indexes, versions, and retrieves managed assets |
| Dream Engine | may propose skill/rule improvements but does not approve them |
| Human Operator | approves protected changes and resolves high-impact conflicts |
Final Rule¶
The rule steward role is the custodian of execution behavior assets.
The governor role protects runtime. The rule steward role protects the quality and evolution of the rule/skill layer.
4.2B Prompt Steward Role¶
Formerly: P01 — ContextPrompting Orchestrator
This role is part of the XIOPro Optimizer (see Part 1, Section 8A).
See
resources/DESIGN_rc_architecture.mdfor the Remote Control architecture design covering how human-agent interaction surfaces (Open WebUI, Prompt Composer) connect to the prompt steward role via the Control Bus.
Role¶
Role bundle responsible for transforming vague human intent and incomplete task context into execution-ready prompt packages.
The prompt steward does not replace context engineering.
It complements context engineering by deciding:
- whether enough information already exists
- whether questions should be asked
- which questions are worth asking
- how human answers should be converted into durable execution context
- how the final prompt package should be assembled for the active runtime
Why It Exists¶
XIOPro does not rely on a single giant "super prompt".
Prompt quality is not only a writing problem. It is also a questioning problem.
As topics, tickets, and issues evolve, the system must be able to:
- detect ambiguity
- detect missing constraints
- detect weak assumptions
- ask the minimum useful questions
- preserve the answers for future execution continuity
This applies to:
- XIOPro itself
- all STRUXIO products (see
MVP1_PRODUCT_SPEC.mdfor the first product) - future STRUXIO.ai product flows
Core Principle¶
XIOPro replaces static prompt engineering with:
- context engineering
- prompt orchestration
- interactive inquiry
When ambiguity materially affects quality, risk, relevance, or cost, the system should prefer targeted inquiry over silent assumption.
Primary Responsibilities¶
The prompt steward must:
- assess task readiness before execution
- identify missing intent, constraints, preferences, and assumptions
- select an appropriate prompting mode
- generate targeted clarifying questions
- classify questions as optional or blocking
- convert human answers into structured execution context
- assemble runtime-specific prompt packages for the orchestrator / execution agents
- maintain prompt lineage across revisions and retries
- support human collaboration during design/problem-shaping tasks
Non-Responsibilities¶
The prompt steward role must not:
- replace orchestrator execution orchestration
- replace governor governance
- replace rule steward rule/skill stewardship
- ask unlimited annoying questions
- block execution when policy allows bounded assumptions
- silently mutate durable context without traceability
ContextPrompting Modes¶
Mode Meaning¶
direct= execute immediately with no inquiry unless required by policygoverned= ask only required approval/risk/policy questionsclarify= ask a small number of targeted questions before executioncollaborate= work interactively with the human to shape the problem
Default Mode¶
Default user-facing mode should be:
Question Budget¶
The prompt steward should control how many questions are asked.
Typical guidance:
none→ direct execution utility tasklight→ one or two material clarificationsnormal→ bounded pre-execution shapingdeep→ collaborative framing for complex strategic/design work
Prompting Readiness Decision¶
Before execution, the prompt steward should determine one of:
prompt_readiness_decision:
- ready_now
- ask_optional_questions
- ask_blocking_questions
- require_human_collaboration
- require_governed_approval
Inquiry Output Classes¶
Human answers gathered by the prompt steward should be transformed into durable objects such as:
- clarified_intent
- assumptions
- constraints
- preferences
- unresolved_questions
- approval_inputs
- prompt_packet_inputs
These outputs must be attachable to:
- ticket
- task
- activity
- runtime
- session
- human decision history
Prompt Package Contract¶
The prompt steward should produce a bounded prompt package rather than a monolithic prompt blob.
prompt_package:
task_id: string|null
runtime_id: string|null
prompting_mode: enum
readiness_decision: enum
goal: string|null
scope: [string]
constraints: [string]
assumptions: [string]
unresolved_questions: [string]
relevant_context_refs: [string]
relevant_rule_refs: [string]
relevant_skill_refs: [string]
human_answer_refs: [string]
recommended_next_step: string|null
Example Operating Logic¶
- If context is sufficient and risk is low →
direct - If policy or approval applies →
governed - If a few answers would materially improve quality →
clarify - If the problem itself needs shaping with the founder →
collaborate
Collaboration Rule¶
For strategic design, architecture, product shaping, and other high-value ambiguous work,
XIOPro should usually prefer collaborate mode.
This is especially relevant to blueprint creation, MVP definition, and early STRUXIO.ai product design.
Interaction with Other Components¶
| Component | Relation to Prompt Steward |
|---|---|
| Orchestrator role | consumes execution-ready prompt packages |
| Governor role | constrains prompting when governance/policy requires |
| Rule Steward role | supplies validated rules/skills/activations for prompt assembly |
| Librarian | supplies supporting knowledge/context assets |
| RC / UI | provides the human interaction surface for inquiry |
| Dream Engine | may identify missing recurring skills/questions patterns |
Success Criteria¶
The prompt steward is successful when:
- the system asks fewer but better questions
- execution starts with clearer intent
- assumptions become explicit rather than hidden
- human collaboration improves high-value tasks
- prompt packages remain compact, relevant, and traceable
- inquiry improves quality without becoming friction-heavy
Final Rule¶
Prompting in XIOPro is not a single artifact.
It is a governed interactive process for shaping execution quality.
4.2C Module Steward Role¶
Formerly: M01 — Module Portfolio Steward & Optimizer
This role is part of the XIOPro Optimizer (see Part 1, Section 8A).
Role¶
Role bundle responsible for governing, evaluating, optimizing, and evolving XIOPro's module portfolio across:
- subscription-backed access
- API-key access
- local/self-hosted runtimes
- cloud/server-hosted runtimes
- future hybrid execution paths
The module steward treats module choice as a governed optimization problem, not an ad hoc per-agent preference.
Why It Exists¶
XIOPro will use many modules across many surfaces, agents, and workflows.
That creates a portfolio problem, not just a cost problem.
The system must continuously optimize the use of modules and subscriptions across constrained resources such as:
- compute power
- memory
- bandwidth
- time / latency
- monetary cost
- quota / subscription utilization
while maximizing:
- quality
- stability
- trust
Without a dedicated steward, module usage drifts into:
- duplicated capability
- poor routing choices
- underused subscriptions
- wasteful cost
- weak fallback design
- hidden dependency on vendor-specific surfaces
- unmanaged self-hosted complexity
Primary Responsibilities¶
The module steward must:
- maintain the governed registry of available modules and access paths
- understand which modules are available by subscription, API, or self-hosting
- evaluate module fitness by task type, quality target, and environment
- optimize module selection across constrained resources
- recommend preferred and fallback modules
- detect waste, underuse, poor fit, and overlap
- detect deprecated or weakening module options
- recommend when self-hosting becomes justified
- scout and evaluate new modules, plans, and hosting options
- prepare adoption / upgrade / retirement proposals
- coordinate with governor role for runtime enforcement
- coordinate with Part 8 constraints for actual hosting feasibility
Non-Responsibilities¶
The module steward role must not:
- auto-purchase subscriptions
- auto-deploy new module stacks into production
- auto-switch the portfolio without approval where policy requires it
- replace governor runtime governance
- replace prompt steward prompt-package assembly
- replace rule steward rule/skill stewardship
Governed Asset Classes¶
managed_module_assets:
- MODULE
- MODULE_POLICY
- SUBSCRIPTION
- HOSTING_PROFILE
- MODULE_EVALUATION
- MODULE_RECOMMENDATION
Optimization Objective¶
Core optimization principle:
Module choice is a governed optimization game.
Optimization is not only about lowering cost.
It is about achieving the best feasible balance of:
- quality
- stability
- trust
- speed
- resource efficiency
- operational resilience
Typical Dimensions Evaluated¶
The module steward should evaluate at least:
- task fit
- quality / output reliability
- latency
- token / usage cost
- subscription utilization
- memory and compute footprint
- bandwidth / network dependency
- privacy / exposure profile
- execution surface compatibility
- hosting feasibility
- fallback availability
- operational complexity
Example Questions the Module Steward Must Answer¶
- Which module should this class of task prefer by default?
- Which fallback should be used when the preferred module is unavailable?
- Which subscriptions are underused or strategically weak?
- Which self-hosted options are worth evaluating next?
- Which modules should be deprecated or constrained?
- Which environments can actually support a proposed new module?
Evidence Sources & Scouting Inputs¶
The module steward must optimize from evidence, not intuition alone.
Its scouting and evaluation inputs may include:
- provider documentation
- provider pricing and plan changes
- approved benchmark/evaluation reports
- internal task/module evaluation history
- research outputs from the Research Center
- approved web research results
- Hugging Face model and repository research
- local or remote CLI-based research tools
- self-hosting feasibility notes from infrastructure
Hugging Face Rule¶
Hugging Face may be used as a governed scouting source for:
- candidate module discovery
- repository discovery
- self-hosting research leads
- capability comparison
- surrounding ecosystem signals
But Hugging Face findings are not automatic approvals.
They are candidate inputs that must still flow through:
- module steward evaluation
- rule steward / prompt steward / governor constraints where relevant
- approval policy for adoption or strategic change
Runtime Feedback & Telemetry Requirement¶
The module steward must receive real usage evidence from execution.
At minimum, module usage should be traceable by:
- module/provider
- access path
- execution surface
- runtime
- session
- activity
- task
- ticket
- latency / retry profile
- estimated or billed cost where available
This feedback loop is required so module optimization can use:
- actual performance
- actual stability
- actual cost pressure
- actual subscription utilization
- actual fallback frequency
rather than only assumptions.
Optimization Rule¶
A module recommendation is incomplete unless it can be supported by at least one of:
- direct internal usage evidence
- credible external evaluation
- controlled comparison result
- explicit exploratory candidate status
This prevents portfolio decisions from becoming folklore.
Interaction with Other Components¶
| Component | Relation to Module Steward |
|---|---|
| Orchestrator role | executes within the governed module portfolio |
| Governor role | enforces module policy, constraints, and anomaly responses at runtime |
| Prompt Steward role | uses module portfolio guidance when assembling prompt packages |
| Rule Steward role | stewards rules/skills/activations that shape module usage |
| LiteLLM / routing layer | applies preferred/fallback routing decisions where applicable |
| Part 8 infrastructure | provides the actual hosting and resource envelope |
| Human operator | approves protected additions, removals, and strategic changes |
Success Criteria¶
The module steward is successful when:
- module usage is explainable rather than ad hoc
- portfolio choices improve quality, stability, and trust
- cost and resource use are optimized without hidden fragility
- subscriptions are used deliberately rather than accidentally
- self-hosting proposals are grounded in real need and real feasibility
- new modules are evaluated systematically before adoption
- fallback and retirement decisions are intentional
Final Rule¶
Module usage in XIOPro is not a side effect.
It is a governed optimization discipline.
4.2D T1P Implementation Form of Role Bundles¶
Purpose¶
The named XIOPro role bundles are architectural capabilities assigned to agents (see Part 1, Section 8).
For T1P, they must also be made concrete as implementation units.
This section defines what these role bundles are at code and deployment level, so the blueprint can be ticketized without pretending that every role is a separate distributed system.
Note: In the unified agent identity model, all five role bundles (orchestrator, governor, rule_steward, prompt_steward, module_steward) can be assigned to a single agent. They are implemented as separate code modules, not separate agents.
4.2D.1 Core T1P Deployables¶
T1P should begin with a deliberately small set of deployables:
- web-ui
-
widget-first web control center
-
api-service
- FastAPI-based control/API layer
- owns core request/response interfaces
- exposes founder/operator and UI-facing APIs
-
emits SSE streams for live updates
-
worker-service
- Python worker/runtime service
-
executes jobs, orchestration loops, research tasks, and background governance tasks
-
postgres
-
authoritative operational state store
-
reverse-proxy
-
Caddy
-
observability stack
- OpenTelemetry / Prometheus / Grafana
Optional in T1P where needed:
- litellm-router
- only for API-backed module routing paths
-
not required for subscription-only human-operated surfaces
-
Ruflo execution fabric
- runtime integration layer for bounded multi-agent execution
Rule¶
The named professions do not require one deployable each in T1P.
They may begin as application services/modules inside a smaller number of processes.
4.2D.2 Orchestrator Role — Implementation Form¶
Formerly: O00.
T1P form:
- application service / orchestration module
- primarily hosted inside:
api-serviceworker-servicefor longer-running execution and coordination loops
The orchestrator module should own:
- orchestration logic
- task assignment logic
- execution progression logic
- handoff into runtime fabric
- resume/recovery coordination at orchestration level
The orchestrator module should not be implemented as: - only a prompt persona - only a markdown convention - only a UI abstraction
State Ownership¶
The orchestrator module does not own state authoritatively.
Authoritative state remains in: - PostgreSQL-backed work graph / ODM
The orchestrator module reads and mutates that state through explicit services and records.
4.2D.3 Governor Role — Implementation Form¶
Formerly: O01.
T1P form:
- governance service / policy evaluation module
- implemented as:
- synchronous policy checks in
api-service - background anomaly / breaker / rollup evaluation in
worker-service
The governor module should own:
- alert evaluation
- breaker logic
- approval gate checks
- runtime constraint decisions
- cost anomaly checks
- governance event emission
The governor module should not be: - only a chat persona - an invisible UI-side heuristic layer
State Ownership¶
The governor module does not own canonical operational objects.
It owns: - governance logic - governance records - policy evaluation outputs
Authoritative governance events and related objects remain stored in PostgreSQL.
4.2D.4 Rule Steward Role — Implementation Form¶
Formerly: R01.
T1P form:
- application service / governed asset module
- implemented initially inside
api-serviceandworker-service - not required as a separate deployable in T1P
The rule steward module should own:
- search-before-create checks
- validation of rule/skill/activation assets
- conflict/overlap detection
- publication and approval routing
- asset-lifecycle support
State Ownership¶
The rule steward module does not own the source of truth for assets.
Sources of truth remain: - Git-managed asset files - structured mirror records in PostgreSQL where applicable
The rule steward module owns validation and stewardship behavior over those assets.
4.2D.5 Prompt Steward Role — Implementation Form¶
Formerly: P01.
T1P form:
- application service / prompt-package and inquiry module
- implemented initially inside
api-service - may use
worker-servicefor longer-running preparation tasks if needed
The prompt steward module should own:
- prompting mode interpretation
- readiness decision
- question selection
- blocking vs optional inquiry classification
- prompt-package assembly
- promotion of meaningful answers into durable context
State Ownership¶
The prompt steward module does not own chat history as the source of truth.
It reads and writes through: - discussion threads - tasks - sessions - human decisions - prompt package records / structured context refs
4.2D.6 Module Steward Role — Implementation Form¶
Formerly: M01.
T1P form:
- application service / module registry and optimization module
- implemented initially inside
api-service - background evidence aggregation and recommendation refresh may run in
worker-service
The module steward module should own:
- module registry logic
- recommendation logic
- fallback logic
- evidence-backed comparison logic
- subscription/access-path awareness
- hosting-feasibility evaluation
- proposal preparation for adoption/deprecation
T1P Narrowing Rule¶
For T1P, the module steward may begin as a narrow module registry + recommendation layer rather than a large autonomous portfolio engine.
The architectural role remains module_steward, but its initial implementation scope may be deliberately narrow.
4.2D.7 Communication Model¶
T1P communication should stay simple.
UI <-> Backend¶
- REST/JSON over HTTPS
- SSE for live updates
- WebSocket only where true bidirectional streaming is justified
API <-> PostgreSQL¶
- ORM / explicit persistence layer
- no direct UI-to-DB path
API <-> Worker¶
- PostgreSQL-backed job dispatch / claim / update model
- no separate broker required in T1P
Backend <-> Ruflo / runtime surfaces¶
- adapter/service boundary
- explicit execution records
- runtime/session IDs preserved
Backend <-> LiteLLM¶
- used only for API-backed module routes
- not required to mediate subscription-only human-operated paths
4.2D.8 State Ownership Summary¶
| Role / Component | Owns Logic | Owns Canonical State |
|---|---|---|
| Orchestrator | Yes | No |
| Governor | Yes | No |
| Rule Steward | Yes | No |
| Prompt Steward | Yes | No |
| Module Steward | Yes | No |
| Reviewer | Yes | No |
| PostgreSQL / ODM | No | Yes |
| Git-managed governed assets | No | Yes for source assets |
Rule¶
The professions own behavior. The system stores canonical state in explicit durable stores.
This prevents role descriptions from becoming state silos.
4.2D.9 T1P Implementation Constraint¶
T1P should prefer:
- fewer deployables
- more explicit modules/services inside those deployables
- strong contracts
- durable state
- clear event and job records
The blueprint may name many professions.
T1P should not force each profession into an independent distributed runtime prematurely.
4.2D.10 Final Rule¶
For T1P, XIOPro should be implemented as:
- a small number of deployables
- a larger number of explicit services/modules
- one canonical work graph/state layer
- one clear operator UI
- one recoverable orchestration and governance core
This preserves architectural clarity without over-distributing the system too early.
4.2E T1P Implementation Form Table (v5.0 Addition)¶
Each role bundle's concrete T1P implementation form is summarized below for quick reference during ticketization and implementation.
All role bundles are implemented as separate code modules, not separate agents. Current assignment: see Section 19.
| Role Bundle | T1P Implementation |
|---|---|
| orchestrator | Python module in api-service. Reads ODM, assigns tasks, triggers execution. Uses Ruflo for agent spawning. |
| governor | Python module in api-service. Policy evaluation, breaker logic, cost tracking. Thin initially. |
| rule_steward | Python module in worker-service. Validation, conflict detection, search-before-create. Thin initially -- most logic handled by orchestrator module inline. |
| prompt_steward | Python module in api-service. Readiness assessment, question generation, prompt assembly. Start as simple logic, not full orchestrator. |
| module_steward | Python module in worker-service. Module registry, usage tracking, recommendation. Start as config file + simple recommendation logic. |
| reviewer | On-demand agent spawned per review request. No persistent module -- spawned as a short-lived Claude Code session with reviewer role activation. Verdict stored as Activity Evaluation in PostgreSQL. |
4.2F Host Resource Awareness (v5.0 Addition)¶
The orchestrator must check host capacity before spawning any agent.
Pre-Spawn Check¶
Before spawning an agent, the orchestrator must:
- Query the Host Registry for the target host's current state
- Check
active_agentsagainstmax_concurrent_agents - Check current RAM usage against 85% threshold
- If capacity insufficient: queue the task, try another host, or escalate
Agent-Host Binding¶
Every Agent Runtime must carry:
host_id: string # which host this agent runs on
host_name: string # human-readable reference
resource_estimate:
ram_gb: float # estimated RAM this agent will consume
cpu_cores: float # estimated CPU usage
Multi-Host Execution¶
XIOPro supports execution across multiple hosts:
| Host | Role | Typical Workloads |
|---|---|---|
| Hetzner CPX62 | control_plane | orchestrator (all roles), services, domain brains, workers |
| Mac Studio M1 (32GB) | hybrid | remote worker, local experiments, overflow agents |
| Future cloud nodes | worker / gpu | compute-intensive tasks, self-hosted models |
The orchestrator should prefer the control plane host for orchestration and distribute overflow to available hosts.
OOM Prevention¶
- 85% RAM threshold triggers "no new agents" gate
- 90% triggers graceful shutdown of lowest-priority agents
- 95% triggers emergency agent termination + alert to founder
- Host health is monitored by the governor with breaker policies
4.2G Ruflo Relationship to Orchestrator Role (v5.0 Clarification)¶
Ruflo (claude-flow) is the agent execution runtime. The orchestrator is the orchestration logic that uses Ruflo.
The separation is:
- The orchestrator decides WHAT to execute. It reads the work graph, selects tasks, assigns agents, determines execution order, and manages progression.
- Ruflo decides HOW to spawn agents. Ruflo handles agent lifecycle, process spawning, sub-agent coordination, execution boundaries, and runtime fabric management.
The orchestrator invokes Ruflo as its execution fabric. Ruflo does not contain orchestration logic -- it provides the runtime machinery that the orchestrator directs.
This distinction prevents confusion between the orchestration role and the execution runtime (Ruflo). They are complementary, not interchangeable.
4.2H XIOPro Control Bus (v5.0 Addition)¶
The XIOPro Control Bus is the unified communication and coordination backbone. Full specification (architecture, capabilities table, intervention model, push delivery, data access rules, migration path): see Part 2, Section 5.8.
Agent Communication Flow¶
Agent starts session
→ registers with Bus (POST /agents/register) using 3-digit agent_id
→ opens SSE channel (GET /events/{agent_id})
→ receives tasks, messages, interventions via push
→ reports activity results back to Bus
→ heartbeats every 60 seconds
Agent ends session
→ Bus marks agent offline
→ queued messages persist for next session
Relationship to Ruflo¶
| Layer | Scope | Persistence |
|---|---|---|
| Control Bus | Cross-session, cross-host | PostgreSQL — survives everything |
| Ruflo | Within-session, within-host | Session memory — dies with session |
Ruflo reports state to the Bus. The Bus does not depend on Ruflo.
4.2I Reviewer Role (v5.0 Addition)¶
Role¶
The Reviewer is a short-lived agent role spawned by an orchestrator (GO or PO) after a builder agent completes significant work. Its sole purpose is to independently evaluate the output against the original specification and return a verdict to the spawning orchestrator.
The Reviewer is not the builder. It is never the same agent that produced the work.
Why It Exists¶
Builders verify their own output via the Completion Self-Check Protocol (Section 5.2). That is insufficient for high-stakes deliverables. Self-evaluation has a structural blind spot: the builder shares the same context, assumptions, and potential misunderstandings that produced the work.
The Reviewer role closes this gap by introducing an independent perspective:
- reads the spec and the output independently, with no shared build context
- applies a different model tier where possible (Opus reviews Sonnet's work; Sonnet reviews Haiku's work)
- cannot be influenced by the builder's reasoning path
- reports a clean verdict with evidence
When to Spawn a Reviewer¶
A Reviewer should be spawned when:
- a ticket is marked
significant(architectural change, public API, schema migration, security-sensitive work) - the builder's Completion Self-Check confidence is in the 0.5–0.8 range
- the orchestrator's policy for the project requires mandatory review
- the builder explicitly requests independent review (rare but permitted)
A Reviewer is NOT spawned for:
- routine sub-hour tasks
- documentation edits without behavioral impact
- tasks already reviewed by a human via RC
Spawning Rule¶
reviewer_spawn_rule:
spawned_by: orchestrator (GO or PO)
trigger: builder marks task complete on a significant ticket
constraint: reviewer_agent_id != builder_agent_id
model_preference:
- if builder used sonnet → prefer opus for reviewer
- if builder used opus → prefer sonnet for reviewer
- if builder used haiku → prefer sonnet for reviewer
- if preferred model unavailable → use any different model tier
lifecycle: short-lived — spawned for one review, terminates after verdict
bus_registration: yes — registered in Control Bus for traceability
cost_attribution: separate ledger entry, attributed to the ticket
Reviewer Responsibilities¶
The Reviewer must:
- read the original ticket specification (goal, scope, acceptance criteria)
- read the builder's output (code, document, artifact, or result)
- evaluate each acceptance criterion independently
- identify gaps, regressions, or spec deviations
- produce a structured verdict
The Reviewer must not:
- fix the output itself
- negotiate with the builder
- consult the builder about intent
- carry over context from a previous session on this ticket
Verdict Structure¶
review_verdict:
ticket_id: string
task_id: string
reviewer_agent_id: string
builder_agent_id: string
reviewer_model: string
builder_model: string
verdict: APPROVED | NEEDS_FIX | REJECTED
criteria_results:
- criterion: string
result: pass | fail | partial
evidence: string
gaps_found: [string]
fix_required: [string] # populated when verdict = NEEDS_FIX
rejection_reason: string # populated when verdict = REJECTED
recommendation: string
Verdict Meanings¶
APPROVED— all acceptance criteria pass; orchestrator may close or promote the ticketNEEDS_FIX— one or more criteria are partial or failing; orchestrator re-assigns to builder with the fix listREJECTED— output does not meet the spec at a fundamental level; orchestrator decides whether to reassign or escalate
Orchestrator Response to Verdict¶
| Verdict | Orchestrator Action |
|---|---|
| APPROVED | Mark task complete, proceed with ticket progression |
| NEEDS_FIX | Re-open task, assign fix list to original builder, re-trigger review on completion |
| REJECTED | Escalate to human (RC) or reassign entire task to a different agent |
Relation to Completion Self-Check¶
The Completion Self-Check (Section 5.2) is the builder's internal gate. The Reviewer is the external gate.
Both must pass before a significant ticket is closed.
Builder self-check passes → task marked complete (builder)
→ orchestrator spawns Reviewer
→ Reviewer returns verdict
→ APPROVED → ticket closed
→ NEEDS_FIX / REJECTED → ticket re-opened
T1P Implementation Form¶
- Reviewer is spawned as an on-demand agent (Pattern 2, Section 5A.2) with the reviewer role assigned
- For T1P, review is triggered manually by the orchestrator after builder completion on significant tickets
- Automated spawn-on-completion is a post-T1P enhancement
- The
review_verdictoutput is stored as an Activity Evaluation entity (Part 3, Section 4.6.1) attached to the reviewed task
Interaction with Other Components¶
| Component | Relation to Reviewer |
|---|---|
| Orchestrator role | spawns Reviewer, receives verdict, acts on it |
| Builder (Specialist/Worker) | produces the work being reviewed; cannot interact with active Reviewer |
| Completion Self-Check | builder-side gate that precedes Review spawn |
| Activity Evaluation (Part 3) | verdict stored as evaluation record |
| Governor role | tracks review cost; may require review for high-cost tickets |
| RC | receives REJECTED verdicts requiring human judgment |
Final Rule¶
The Reviewer exists to catch what self-evaluation misses.
It is not a bureaucratic gate. It is a targeted quality signal for work that matters.
4.3 Ruflo — Agent Swarm Engine¶
Role¶
Agent orchestration runtime.
Responsibilities¶
- spawn agents
- manage sub-agents
- control execution lifecycle
- enforce boundaries
Notes¶
- acts as execution fabric
- integrates with Claude Code / other agents
- supports multi-agent collaboration
4.4 LiteLLM — Model Router¶
Role¶
Provider abstraction layer.
Responsibilities¶
- route requests to:
- Claude
- OpenAI
- Gemini
- local models
- optimize cost vs performance
- fallback handling
- unify API interface
Key Feature¶
Enables provider independence
4.5 Execution Engine¶
Role¶
Actual execution runtime.
Can include:¶
- Claude Code (primary)
- RooCode
- custom Python agents
- CLI-based execution
- future local models
4.6 Remote Control (RC)¶
4.6.1 Purpose¶
RC enables the human operator to interact with live execution in a controlled and auditable way.
RC exists to:
- attach to a running execution context
- respond to escalation requests
- approve or reject protected actions
- inject bounded guidance
- redirect or constrain execution
- recover decision continuity during ambiguity or failure
RC is the primary human interaction surface for live XIOPro brains.
It supports:
- exploratory conversation
- execution-bound discussion
- approval and escalation handling
- recovery intervention
- bounded guidance and redirection
When RC interaction materially affects execution, it must be converted into durable operational state.
4.6.2 Principle¶
RC transforms XIOPro from:
autonomous system → governed autonomous system
The key principle is:
RC is not the system of record.
The system of record is the durable operational state held in:
- Agent Runtime
- Session
- Escalation Request
- Human Decision
- Activity / Ticket / Task lineage
- Transcript and context references
RC is the human interaction surface over those objects.
4.6.3 Canonical Objects Used by RC¶
RC must operate on the canonical runtime objects defined in the ODM.
Agent Runtime¶
Represents the live actor doing work.
RC may:
- attach to it
- pause it
- constrain it
- redirect it
- resume it
Session¶
Represents the durable execution session.
RC must attach to a session, not merely to an abstract agent name.
A session may be:
- active
- idle
- paused
- waiting
- blocked
- crashed
- recovering
- closed
- archived
Escalation Request¶
Represents a durable request for human discussion, clarification, or approval.
RC should open or respond to an escalation request rather than relying on ad hoc chat state.
Human Decision¶
Represents the durable answer or approval outcome recorded by the founder/operator.
RC is one way to create a Human Decision, but the decision must persist beyond the UI.
Execution Surface¶
Represents where the runtime is actually executing, such as:
- Claude Code
- Codex
- Gemini CLI
- custom CLI
- API worker
- future local model runtime
RC must be aware of execution surface constraints.
4.6.4 RC Interaction Modes¶
RC should support at least these modes:
Attach Mode¶
Used when the founder wants to connect to an already-running session.
Escalation Response Mode¶
Used when a task or runtime has opened a durable Escalation Request.
Approval Mode¶
Used when a protected action requires formal go / no-go input.
Redirect Mode¶
Used when the founder changes goal, scope, constraints, provider, or path.
Recovery Mode¶
Used when a session crashed, degraded, or became blocked and a recovery decision is needed.
4.6.5 RC Architecture¶
flowchart TD
Human --> RCInterface
RCInterface --> RCManager
RCManager --> Session
RCManager --> AgentRuntime
RCManager --> EscalationRequest
EscalationRequest --> HumanDecision
HumanDecision --> RCManager
RCManager --> Orchestrator["Orchestrator"]
RCManager --> Governor["Governor"]
Orchestrator --> ExecutionSurface
ExecutionSurface --> Session
Session --> TranscriptStore
Session --> ContextBundle
4.6.6 RC Manager Responsibilities¶
RC Manager is the backend control layer for RC.
It must:
- locate attachable sessions
- bind human interaction to the correct runtime scope
- assemble the required context bundle
- persist interaction history
- route decisions back into runtime execution
- preserve ticket/task/activity lineage
- support multi-brain switching
- prevent uncontrolled cross-session contamination
It must not:
- silently overwrite durable system state
- bypass approval requirements
- become a generic freeform chat relay without structure
4.6.7 Context Bundle Contract¶
Before attaching or escalating, RC should assemble a bounded context bundle.
Minimum bundle contents:
context_bundle:
runtime_id: string
session_id: string
execution_surface_id: string|null
ticket_id: string|null
task_id: string|null
activity_id: string|null
current_goal: string|null
current_state: string
blocker_summary: string|null
recent_actions_ref: string|null
relevant_knowledge_refs: [string]
transcript_ref: string|null
checkpoint_ref: string|null
escalation_request_id: string|null
pending_approval: boolean
recommended_next_step: string|null
This keeps human intervention compact, explicit, and resumable.
4.6.8 RC Triggers¶
RC may be invoked by:
requires_human = truerequires_approval = true- ambiguity detected
- recovery tradeoff required
- quality failure requires judgment
- runtime blocked
- founder manual intervention
- governance escalation from the governor
4.6.9 RC Flow¶
flowchart TD
RuntimeActive --> TriggerDetected
TriggerDetected --> EscalationOrAttach
EscalationOrAttach --> ContextBundleBuilt
ContextBundleBuilt --> HumanInteraction
HumanInteraction --> HumanDecisionRecorded
HumanDecisionRecorded --> RuntimeResume
RuntimeResume --> SessionUpdated
SessionUpdated --> AuditTrail
4.6.10 RC Interaction Modes¶
| Mode | Purpose | Durability Requirement |
|---|---|---|
| exploratory conversation | think with a brain without immediate execution change | optional unless promoted |
| execution-bound discussion | guide or clarify active work | must persist if it affects work |
| approval / escalation | formal human gate | durable by default |
| recovery intervention | unblock or redirect after failure/degradation | durable by default |
Rule¶
RC is the unified human interaction surface for XIOPro brains.
Not every RC conversation must mutate execution state.
But any RC conversation that changes execution, constraints, approvals, direction, or recovery must be recorded as durable operational state.
4.6.11 RC Success Criteria¶
RC is successful when:
- the founder can attach to the correct live execution context
- discussion and approval become durable system state
- context injection is bounded and traceable
- session continuity is preserved after intervention
- multiple brains can be switched without confusion
- recovery decisions are captured and replayable
4.6.12 Final Statement¶
RC is not a convenience chat layer.
It is the controlled human intervention surface for live execution.
4.7 Session Manager¶
4.7.1 Role¶
Session Manager owns session lifecycle control.
It ensures that execution continuity survives:
- normal pause/resume
- human escalation
- provider instability
- runtime crash
- surface switch
- controlled recovery
4.7.2 Responsibilities¶
Session Manager must:
- open and close sessions
- monitor session health
- persist session checkpoints
- transfer or rebuild context
- coordinate session recovery
- track attachment eligibility
- support resume semantics
- preserve transcript references
- prevent orphaned runtimes
4.7.3 Session State Model¶
Session Manager must honor the canonical session states:
- active
- idle
- paused
- waiting
- blocked
- crashed
- recovering
- closed
- archived
Interpretation¶
active= currently executingidle= no immediate work, resumablepaused= intentionally haltedwaiting= waiting on human/dependency/eventblocked= cannot continue without interventioncrashed= abnormal interruptionrecovering= recovery path underwayclosed= ended and no longer activearchived= retained for history
4.7.4 Recovery Paths¶
Session recovery should support at least:
- retry same session
- resume with new session on same surface
- switch execution surface
- switch model/provider path
- escalate to human for recovery decision
- terminal close
Recovery must preserve lineage to:
- runtime
- ticket
- task
- activity
- escalation request
- human decision
4.7.5 Attachment Eligibility¶
A session is attachable when:
- it has not been terminally closed
- it is still relevant to live or recoverable work
- required context is available
- ownership/lock conditions permit intervention
Attachable states usually include:
- active
- idle
- paused
- waiting
- recovering
4.7.6 Ownership & Locking¶
Session Manager should prevent unsafe simultaneous human/control collisions.
Minimum rules:
- one human control attachment at a time
- explicit lock release on detach or timeout
- emergency override allowed with audit log
- session ownership visible to the orchestrator/governor and the control surface
4.7.7 Session Success Criteria¶
Session management is successful when:
- sessions survive ordinary interruptions
- recoverable failures remain recoverable
- no important context is silently lost
- human intervention can resume the right work reliably
- execution surfaces can be switched without breaking lineage
4.8 Memory / Context Layer¶
4.8.1 Role¶
This layer preserves the working context needed for execution continuity.
It is not identical to long-term knowledge storage.
It provides the bounded memory bridge between:
- live runtime execution
- ticket/task state
- durable knowledge
- human intervention
- recovery
4.8.2 Context Horizons¶
Short-Term¶
Immediate live execution state:
- current step
- recent tool calls
- latest outputs
- transient working memory
Mid-Term¶
Execution continuity state:
- ticket/task context
- session checkpoints
- escalation context
- current constraints
- recent decisions
Long-Term¶
Persistent knowledge:
- rules
- skills
- activations
- documents
- reflections
- prior decisions
- knowledge graph / ledger references
4.8.3 Context Sources¶
The context layer may assemble context from:
- Session transcript
- checkpoint artifacts
- task/activity state
- Librarian / knowledge layer
- Knowledge Ledger
- human decisions
- governance decisions
- Dream-derived improvements where approved
4.8.4 Context Rules¶
The context layer must:
- minimize irrelevant context
- preserve critical continuity
- avoid cross-ticket contamination
- keep human intervention bounded
- allow reconstruction after crash or restart
It must not:
- blindly dump all history into runtime prompts
- let one brain inherit another brain's state without justification
- treat chat history as the only memory source
4.8.5 Resume Bundle¶
A resumed runtime should receive a structured resume bundle, not just raw transcript replay.
Minimum resume bundle:
resume_bundle:
session_id: string
runtime_id: string
current_goal: string|null
latest_valid_checkpoint_ref: string|null
latest_human_decision_ref: string|null
active_constraints: [string]
relevant_knowledge_refs: [string]
next_expected_action: string|null
4.8.6 Success Criteria¶
The context layer is successful when:
- tasks resume without silent amnesia
- human decisions remain attached to execution
- context remains compact and relevant
- session recovery is practical
- long-term knowledge improves execution without polluting it
4.8A Memory Engineering Principles (from 5-Layer Memory Stack Research)¶
These 5 production engineering rules apply to all XIOPro memory operations (Hindsight, Librarian, state files, knowledge vault). Derived from @the_enterprise.ai's "5-Layer AI Agent Memory Stack" research (see struxio-knowledge/vault/research_inbox/REVIEW_5_layer_memory_stack_images.md for full analysis).
memory_engineering_principles:
1_async_updates:
rule: "Never block the main agent execution for memory operations"
implementation: "Memory extraction, indexing, and storage happen in background threads or post-activity hooks"
applies_to: [Hindsight, Librarian, Knowledge Ledger]
2_debounce_writes:
rule: "Batch memory operations — don't write on every turn"
implementation: "Wait 30 seconds or N turns, batch messages, make one extraction call"
applies_to: [Hindsight, session state, activity logging]
benefit: "Reduces token usage and prevents memory thrashing"
3_confidence_threshold:
rule: "Discard low-confidence facts (< 0.7). Cap total facts per agent at 100"
implementation: "Every stored fact gets a confidence score 0.0-1.0. Below threshold = discard. Above cap = trim lowest confidence."
applies_to: [Knowledge Objects, Hindsight memories, agent lessons]
benefit: "Prevents unbounded memory growth and low-quality knowledge accumulation"
4_token_budget:
rule: "Control context injection size — max 2000 tokens for memory context"
implementation: "When injecting memories/knowledge into agent context, trim to budget by removing lowest-confidence items first"
applies_to: [Prompt Steward context assembly, Hindsight auto-inject, RAG retrieval]
benefit: "Prevents context window bloat, keeps agent focused"
5_atomic_writes:
rule: "Write state files atomically — temp file then rename"
implementation: "Write to plan.yaml.tmp, then mv plan.yaml.tmp plan.yaml. Never corrupt state mid-write."
applies_to: [plan.yaml, next-actions.yaml, agents.yaml, all state files, session checkpoints]
benefit: "Prevents corrupted state from crashes or concurrent writes"
Relation to Existing XIOPro Architecture¶
| Principle | XIOPro Component | Current Status | T1P Action |
|---|---|---|---|
| 1. Async Updates | Bus async messaging, background workers | Partially covered by Bus architecture | Enforce for Hindsight/Librarian processing |
| 2. Debounce Writes | Hindsight extraction, session state | Not yet implemented | Add batching to Hindsight extraction pipeline |
| 3. Confidence Threshold | Knowledge Objects, Hindsight memories | Not yet implemented — no confidence field exists | Add confidence field to Knowledge Objects (Part 5, Section 6.1) |
| 4. Token Budget | Context Rules (Section 4.8.4), Prompt Steward | Context rules say "minimize" but lack hard number | Define 2000-token hard budget for memory context injection |
| 5. Atomic Writes | State files (plan.yaml, next-actions.yaml) | Not enforced | Implement write-to-temp-then-rename for all state files |
Implementation Requirements¶
- All agents must use atomic writes for state file mutations (Principle 5)
- The Prompt Steward (Section 4.2B) must enforce the 2000-token memory context budget (Principle 4) when assembling prompt packages
- Hindsight and Librarian processing must be async and debounced (Principles 1, 2) — see Part 5, Sections 9 and 4 for implementation requirements
- Knowledge Objects must carry a confidence score field; the Librarian must enforce threshold and cap (Principle 3) — see Part 5, Section 6.1
4.9. Dream Engine (Sleep-Time Intelligence Layer)¶
The Dream Engine and its T1P subset (Idle Maintenance, Section 4.9.9) are part of the XIOPro Optimizer (see Part 1, Section 8A).
XIOPro includes a background cognition layer called the Dream Engine.
This system runs during idle periods and performs:
- memory consolidation
- knowledge pruning
- contradiction resolution
- pattern extraction
- cost optimization suggestions
- system-level improvements
4.9.1 Purpose¶
Prevent:
- memory decay
- knowledge fragmentation
- context pollution
- repeated mistakes
Enable:
- long-term intelligence accumulation
- system self-improvement
- reduced token usage over time
4.9.2 Trigger Conditions¶
Dream cycles are triggered when:
- time threshold reached (e.g. 24h)
- activity threshold reached (e.g. N sessions / N tasks)
- manual trigger by founder
- major system change detected
4.9.3 Scope of Operation¶
Dream Engine operates on:
- knowledge base (.md / .yaml / DB)
- tickets history
- task execution logs
- reflections
- agent performance data
4.9.4 Core Phases¶
-
Orientation
-
scan current system state
- build structural map
-
Signal Extraction
-
identify:
- repeated failures
- corrections
- decisions
- patterns
-
Consolidation
-
merge duplicates
- remove obsolete entries
- normalize metadata
- convert relative → absolute time
-
Optimization
-
propose:
- rule updates
- skill improvements
- routing optimizations
- cost reductions
-
Index Rebuild
-
update:
- librarian index
- topic structure
- search mappings
4.9.5 Output Artifacts¶
Dream Engine produces:
- updated knowledge files
- improvement proposals
- rule modification suggestions
- skill enhancement suggestions
- anomaly reports
4.9.6 Governance¶
- runs in isolated mode
- cannot modify execution directly
- requires approval for:
- rule changes
- system behavior changes
4.9.7 Relation to Other Systems¶
| System | Role |
|---|---|
| Auto Memory | capture |
| Librarian | organize |
| Dream Engine | refine |
| Reflection Engine | evaluate |
| Improvement Engine | apply |
4.9.8 Strategic Impact¶
Dream Engine transforms XIOPro from:
"execution system"
into:
self-evolving intelligence system
4.9.9 T1P Dream Engine Posture (v5.0 Addition)¶
Posture: Idle Maintenance Only
The full Dream Engine architecture is preserved in this blueprint as the target capability.
For T1P, only the following subset is implemented:
- Memory consolidation (AutoDream) -- consolidate session artifacts, clean transient state
- Stale knowledge detection -- flag documents that have not been referenced or updated beyond a threshold
- Morning brief generation -- produce a daily summary of system state, pending work, and alerts
- Session cleanup -- archive completed sessions, remove orphaned runtime artifacts
- Idea review -- scan Ideas with status
newordeferredwhosenext_review_athas passed or whoselast_reviewed_atexceeds the configured review cycle. Surface stale ideas in the morning brief for user attention.
The full Dream Engine phases (signal extraction, optimization proposals, index rebuild, contradiction resolution) are deferred to post-T1P.
Rule¶
T1P Dream Engine is operational but narrow. It maintains system hygiene without attempting autonomous intelligence evolution. Full capability is a post-T1P milestone.
4.10 Agent Activation Architecture (v5.0 Addition)¶
Problem¶
Current activation files (ACTIVATE_BM.md, ACTIVATE_B1.md, etc.) are 65-108 lines each and contain significant duplication:
| Duplicated Content | Lines | Repeated In |
|---|---|---|
| Execution Discipline (Boris rules) | 6-8 | All 7 agents |
| Paperclip protocol | 2-4 | All agents |
| Session Start Protocol | 5-6 | All agents |
| Memory (Hindsight + Obsidian) | 2-3 | domain brains |
| Worker spawning rules | 3-4 | domain brains |
| First Action (read tools + state) | 3-5 | All agents |
This wastes ~1,400 tokens per agent load. Over dozens of daily sessions across 7 agents, this is significant token waste — and worse, it creates maintenance burden (changing a rule means editing 7 files).
Solution: Skill-Based Activation¶
Extract duplicated content into shared skills. Activation files become slim identity-only documents that declare which skills to load.
Extracted Skills¶
| Skill | Content | Loaded By |
|---|---|---|
SKILL_bootstrap |
Read tools reference, state files, lessons. Set context. | All agents |
SKILL_execution_discipline |
Boris Cherny rules: plan first, subagents, self-improve, verify, circuit breaker, git discipline. | All agents |
SKILL_memory |
Hindsight bank setup, Obsidian query patterns, knowledge retrieval. | All agents |
SKILL_worker_spawn |
Worker naming ([brain_id][seq]), max 3 active, headless mode, supervision rules. | domain brains |
SKILL_paperclip_sync |
Already exists. Ticket checkout, comments, completion, cost reporting. | All agents |
SKILL_session_start |
Heartbeat, bus poll, state load, resume top action. | All agents |
Slim Activation File Pattern¶
---
title: "ACTIVATE: 002 — Engineering Brain"
agent: 002
version: "5.0.0"
skills_on_load: [bootstrap, execution-discipline, session-start]
skills_available: [paperclip-sync, memory, worker-spawn]
---
# 002 — Engineering Brain
## Identity
You are **002** (Engineering) — STRUXIO's Product Engineering Brain.
Ruflo worker on Hetzner under orchestrator coordination.
## Domain
Python, REST APIs, product integrations, domain-specific tooling.
## Workers
201 (coder), 202 (tester), 203 (code-reviewer). Max 3.
## Model
Sonnet 4.6 default. Opus when ticket specifies.
## On Activation
Load `skills_on_load` from frontmatter. Execute SKILL_bootstrap.
Other skills load on demand when triggered.
~20 lines instead of 68. Token savings: ~200 tokens per agent load.
Skill Loading Strategy¶
skills_on_load: # Always loaded at session start. Critical for identity and bootstrap.
skills_available: # Loaded on demand when the agent's task requires them.
This mirrors how Superpowers skills work — frontmatter declares what's available, runtime loads when needed.
Connection to Dream Engine / Idle Maintenance¶
This optimization IS what the Dream Engine does in practice:
- Review activation files, skill files, and rules for duplication
- Detect redundancy, drift, and token waste
- Propose consolidation (new shared skills, slim activation files)
- Report to founder / rule steward role for approval
Adding to the Idle Maintenance scope:
idle_maintenance_tasks:
- memory_consolidation # existing
- stale_knowledge_detection # existing
- morning_brief # existing
- session_cleanup # existing
- activation_optimization # NEW: review activation files for duplication
- skill_dedup_detection # NEW: detect overlapping skills
- token_waste_analysis # NEW: estimate token savings from consolidation
- idea_review # NEW: scan ideas not reviewed within their review_cycle
- skill_performance_review # NEW: compare internal skill metrics against alternatives (see Part 5 Section 8.9A)
- skill_token_optimization # NEW: identify skills with high token usage, suggest alternatives (see Part 5 Section 8.9A)
This is the bridge between "Idle Maintenance" (T1P) and "Dream Engine" (full capability) — practical optimization that proves the Dream concept without requiring the full autonomous intelligence layer.
Migration Plan¶
- Create the 4 new skills (bootstrap, execution-discipline, memory, worker-spawn)
- Update SKILL_REGISTRY.yaml with all skills
- Slim activation files (one at a time, test each)
- Add Idle Maintenance task to detect future drift
4.11 Skill Selection Architecture (v5.0 Addition)¶
When the orchestrator assigns a task, it must select which skills the agent loads. This prevents token waste (loading 48 skills when 3 are needed) and ensures model-appropriate skill assignment.
Problem¶
The Skill Registry (Part 5, Section 8.9) defines what skills exist. The Activation Architecture (Section 4.10) defines how activation files reference skills. Neither defines which skills to load for a given task assignment.
Without selection logic: - Every agent loads all skills it has access to (token waste) - Haiku agents receive skills that require deep reasoning (quality loss) - Task-irrelevant skills dilute the agent's context (precision loss)
Foundation: Role → Topic → Skill Binding Chain (v5.0.5 Clarification)¶
Skills bind to roles via topics, not directly to agent numbers. A role has multiple topics. A topic has multiple skills. Any agent assigned a role inherits all topic-skill bindings for that role.
role_topic_skill_chain:
role: designer
topics:
- brand_identity
- content_creation
- visual_design
skills_per_topic:
brand_identity: [voice-dna-creator, brainstorming]
content_creation: [content-research-writer, writing-plans]
visual_design: [brainstorming]
role: specialist_compliance
topics:
- iso_19650
- bim_fidelity
- cde_management
skills_per_topic:
iso_19650: [claude-deep-research, writing-plans]
bim_fidelity: [systematic-debugging, verification-quality]
cde_management: [writing-plans]
This binding chain is the structural foundation for the 3-step selection filter below. Step 1 resolves the role's topic-skill bindings; Steps 2 and 3 then narrow the result by task type and model tier.
Solution: 3-Step Skill Selection Filter¶
When the orchestrator assigns a task, it computes the skill set through three sequential filters. The final skill set is the intersection of all three.
Step 1 — Filter by Agent Role (via Topic-Skill Bindings)¶
Each role has a base skill set derived from its topic-skill bindings. An agent only considers skills permitted for its role.
agent_role_skills:
orchestrator: [writing-plans, brainstorming, paperclip-sync, dispatching-parallel-agents]
specialist: [writing-plans, TDD, systematic-debugging, code-review, brainstorming, paperclip-sync]
worker: [TDD, verification-before-completion, paperclip-sync]
reviewer: [code-review, receiving-code-review, verification-quality, systematic-debugging]
interface: [] # UI agents have no reasoning skills
Step 2 — Filter by Task Type¶
Each task type declares which skills are relevant. Only skills that survived Step 1 AND appear in the task type list continue.
task_type_skills:
coding: [TDD, systematic-debugging, verification-before-completion]
research: [brainstorming, writing-plans]
review: [code-review, receiving-code-review, verification-quality]
design: [brainstorming, writing-plans]
deployment: [verification-before-completion]
debugging: [systematic-debugging]
planning: [brainstorming, writing-plans, executing-plans]
ticket_management: [paperclip-sync]
Step 3 — Filter by Model Tier¶
The assigned model determines final compatibility. Skills that require reasoning beyond the model's capability are excluded.
model_skill_compatibility:
haiku:
exclude: [brainstorming, writing-plans] # too complex for haiku
best_for: [paperclip-sync, verification-before-completion] # simple execution
sonnet:
exclude: [] # handles everything
best_for: [TDD, systematic-debugging, code-review] # sweet spot
opus:
exclude: []
best_for: [brainstorming, writing-plans, architecture, complex-debugging] # deep reasoning
Selection Formula¶
Example: specialist + coding + sonnet: - Step 1 (role): [writing-plans, TDD, systematic-debugging, code-review, brainstorming, paperclip-sync] - Step 2 (task): [TDD, systematic-debugging, verification-before-completion] - Intersection: [TDD, systematic-debugging] - Step 3 (model): sonnet excludes nothing - Result: [TDD, systematic-debugging]
Known Skill Library (Categorized)¶
All skills managed by the rule steward role. Categories determine default model tier.
Execution Skills (any model)¶
| Skill ID | Purpose |
|---|---|
paperclip-sync |
Ticket lifecycle management |
verification-before-completion |
Verify before marking done |
finishing-a-development-branch |
PR/merge workflow |
using-git-worktrees |
Isolated feature work |
Engineering Skills (Sonnet+)¶
| Skill ID | Purpose |
|---|---|
test-driven-development |
TDD workflow |
systematic-debugging |
Debug before fix |
pair-programming |
AI pair programming |
code-review |
Requesting code review |
receiving-code-review |
Receiving code review |
verification-quality |
Truth scoring |
Architecture Skills (Sonnet/Opus)¶
| Skill ID | Purpose |
|---|---|
brainstorming |
Explore before building |
writing-plans |
Implementation plans |
executing-plans |
Execute with checkpoints |
subagent-driven-development |
Parallel execution |
dispatching-parallel-agents |
Independent work dispatch |
Infrastructure Skills (any model)¶
| Skill ID | Purpose |
|---|---|
hooks-automation |
Hooks management |
swarm-orchestration |
Multi-agent coordination |
swarm-advanced |
Advanced swarm patterns |
Knowledge Skills (Sonnet+)¶
| Skill ID | Purpose |
|---|---|
writing-skills |
Create/edit skills |
skill-builder |
Generate skill templates |
reasoningbank-agentdb |
Adaptive learning |
agentdb-memory-patterns |
Persistent memory |
Domain Skills¶
| Skill ID | Purpose |
|---|---|
sparc-methodology |
SPARC development workflow |
claude-api |
Claude API / Anthropic SDK integration |
github-code-review |
GitHub code review |
github-workflow-automation |
GitHub Actions workflow automation |
github-project-management |
Project board and sprint planning |
github-release-management |
Release orchestration and versioning |
github-multi-repo |
Multi-repository coordination |
github-code-review-swarm |
Swarm-coordinated code review |
flow-nexus-platform |
Flow Nexus authentication, sandboxes, apps |
flow-nexus-swarm |
Cloud swarm deployment with Flow Nexus |
flow-nexus-neural |
Neural network training in Flow Nexus |
Advanced/Candidate Skills (Review Required)¶
These skills exist but require rule steward review before T1P adoption:
| Skill ID | Purpose | Concern |
|---|---|---|
v3-performance-optimization |
Aggressive performance targets | claude-flow v3 specific |
v3-mcp-optimization |
MCP server optimization | claude-flow v3 specific |
v3-cli-modernization |
CLI modernization | claude-flow v3 specific |
v3-ddd-architecture |
DDD architecture patterns | claude-flow v3 specific |
v3-core-implementation |
Core module implementation | claude-flow v3 specific |
v3-security-overhaul |
Security architecture overhaul | claude-flow v3 specific |
v3-memory-unification |
Memory system unification | claude-flow v3 specific |
v3-integration-deep |
Deep agentic-flow integration | claude-flow v3 specific |
v3-swarm-coordination |
15-agent hierarchical coordination | claude-flow v3 specific |
agentdb-vector-search |
Semantic vector search | Advanced, may be premature |
agentdb-memory-patterns |
Persistent memory patterns | Advanced, may be premature |
agentdb-learning |
RL learning plugins | Advanced, may be premature |
agentdb-optimization |
Performance optimization | Advanced, may be premature |
agentdb-advanced |
Multi-DB management | Advanced, may be premature |
reasoningbank-intelligence |
Adaptive learning patterns | Advanced, may be premature |
stream-chain |
Stream-JSON chaining | Niche use case |
browser |
Web browser automation | Overlaps with Playwright MCP |
Full Skill Count Summary¶
| Category | Count | Model Tier |
|---|---|---|
| Execution | 4 | any |
| Engineering | 6 | Sonnet+ |
| Architecture | 5 | Sonnet/Opus |
| Infrastructure | 3 | any |
| Knowledge | 4 | Sonnet+ |
| Domain | 11 | varies |
| Advanced/Candidate | 17 | varies |
| Total | 50 |
The rule steward reviews this catalog during idle maintenance to detect unused skills, propose consolidation, and evaluate candidate skills for promotion or retirement.
The rule steward role reviews this list during idle maintenance to detect unused skills and propose consolidation.
Task Assignment with Skills¶
When the orchestrator assigns a task to an agent, the selection result is included in the assignment:
task_assignment:
task_id: "1001"
agent_id: "002"
skills_required: [TDD, systematic-debugging] # from selection logic
skills_available: [verification-before-completion] # on-demand if needed
model: sonnet
host: hetzner-cpx62
The skills_required field is computed by the 3-step filter. The skills_available field lists additional skills the agent may invoke on-demand (present in its role set but not in the task type set).
Connection to Rule Steward Role¶
The rule steward maintains the skill library and selection logic:
- Reviews skill usage patterns across task assignments
- Detects unused skills (no assignments in 30 days)
- Proposes consolidation when skills overlap
- Updates model compatibility as new models release
- Adds new skills when gaps detected in task coverage
- Adjusts role-skill mappings when new roles are introduced
This is governed by the same idle maintenance cycle defined in Section 4.9.9 and the Rule Steward responsibilities in Section 4.2A.
T1P Implementation¶
For T1P, skill selection is performed manually by the orchestrator when dispatching tasks: - The orchestrator reads the role + task type + model and picks skills accordingly - No automated selection engine required - The YAML definitions above serve as the reference lookup table
Full automation (selection engine integrated with Ruflo task dispatch) is deferred to post-T1P.
5. Execution Flow¶
flowchart TD
Ticket --> Task
Task --> Orchestrator["Orchestrator"]
Orchestrator --> AssignAgent
AssignAgent --> Execute
Execute --> Activity
Activity --> Evaluate
Evaluate --> Continue
5.1 Task Dependency Resolution (v5.0.8 Addition)¶
Tasks can have dependencies (depends_on, blocks). The orchestrator must resolve these before assignment.
dependency_resolution:
algorithm: "topological_sort"
rules:
- task cannot start until all depends_on tasks are completed
- if circular dependency detected: flag as error, escalate to user
- parallel execution: tasks with no shared dependencies run simultaneously
- blocked tasks: re-evaluate when any dependency completes
execution_order:
1. build dependency graph from all active tasks
2. topological sort to determine execution order
3. identify tasks with zero dependencies (ready now)
4. assign ready tasks to available agents (respecting host capacity)
5. when task completes: remove from graph, re-check dependents
6. repeat until all tasks complete or blocked
Design Rationale¶
- Topological sort is the minimal correct algorithm for DAG resolution. It guarantees no task starts before its dependencies complete, and it detects circular dependencies (which are errors by definition).
- Parallel execution is implicit: any tasks with zero unresolved dependencies at the same time can run simultaneously, bounded by host capacity (see Section 4.2F).
- Re-evaluation on completion means the orchestrator does not need to pre-compute the full schedule. It reacts to task completion events and releases newly-unblocked tasks.
T1P Implementation¶
For T1P, the orchestrator performs dependency resolution manually:
- Read task
depends_onandblocksfields from the ODM (Part 3, Section 4.5) - Build a simple in-memory dependency graph
- Assign tasks in topological order
- If the graph is small (< 50 tasks per project), no external DAG engine is needed
A formal DAG execution engine (e.g., integrated into Ruflo) is deferred to post-T1P when project complexity may require it.
5.2 Completion Self-Check Protocol (v5.0.8 Addition)¶
Before an agent claims a task is complete, it must run a self-evaluation. This strengthens the Reflection pattern (Part 1, Section 7) from post-hoc to in-line.
completion_self_check:
steps:
1. re_read_objective: "Read the task objective again"
2. check_acceptance_criteria: "For each criterion, verify it is met"
3. run_completion_test: "Execute the completion_test command if defined"
4. self_score: "Rate confidence 0.0-1.0 that the task is truly done"
5. identify_gaps: "List anything that might be incomplete"
6. decision:
- if confidence >= 0.8 and completion_test passes: mark done
- if confidence 0.5-0.8: mark done with caveats noted
- if confidence < 0.5: do NOT mark done, continue working or escalate
output:
completion_evaluation:
task_id: string
confidence: float
criteria_met: [string]
criteria_unmet: [string]
completion_test_result: pass|fail|not_defined
gaps_identified: [string]
decision: done|continue|escalate
Design Rationale¶
- Re-reading the objective counters drift: agents can lose track of the original goal during long execution sequences.
- Acceptance criteria check is explicit: each criterion from the task definition (Part 3, Section 4.5) must be individually verified, not assumed.
- Confidence scoring introduces nuance: not all completions are equal. A task marked "done with caveats" signals to the orchestrator that review may be warranted.
- Escalation at low confidence prevents agents from marking tasks done when they know they fell short. This is cheaper than discovering incomplete work downstream.
Relation to Activity Evaluation¶
The completion_evaluation output becomes an Activity Evaluation entity (Part 3, Section 4.6.1) attached to the final activity of the task. This makes self-evaluation auditable and queryable.
T1P Implementation¶
For T1P, the self-check is enforced via activation files:
- Every agent activation includes the completion self-check protocol as a rule
- The orchestrator verifies that task completion messages include the
completion_evaluationblock - Tasks marked done without evaluation are flagged for review
5.3 Agent Auto-Pickup (v5.0.13 Addition)¶
Agents signal readiness and self-retrieve their next task rather than waiting passively for orchestrator push. This reduces orchestrator polling overhead and allows agents to resume immediately after completing a task.
agent_auto_pickup:
endpoint: POST /agents/{id}/pickup
behavior:
- Agent calls pickup when it becomes idle (task complete or session start)
- Bus evaluates assigned tasks for the agent, returns highest-priority ready task
- If no task is ready: returns 204 No Content — agent polls again after backoff
task_query_endpoint: GET /agents/{id}/tasks
backoff_schedule: [5s, 10s, 30s, 60s]
Why Auto-Pickup¶
- Orchestrator pushes tasks via Bus when assigning, but agents may miss push on session restart
- Auto-pickup ensures no assigned task is silently dropped on session recovery
- Pair with SSE: agent receives push notification AND can self-poll on reconnect
5.4 Paperclip Auto-Sync (v5.0.13 Addition)¶
Paperclip task records are kept in sync with XIOPro ODM task state via fire-and-forget async calls. Agents do not wait for Paperclip acknowledgement.
paperclip_auto_sync:
trigger: any task CRUD operation (create, update, complete, block)
pattern: fire-and-forget
behavior:
- Task state change occurs in XIOPro ODM (source of truth)
- Async call to Paperclip API issued in background
- Failure is logged but does not block execution
- Sync catches up on next successful call
note: Paperclip is the current task tracker (to be superseded by XIOPro ODM)
5A. Agent Spawning Patterns (v5.0 Addition)¶
XIOPro distinguishes three spawning patterns. Each serves a different purpose and has different lifecycle, visibility, and cost characteristics.
5A.1 Agent vs Sub-Agent Distinction¶
| Property | Agent | Sub-Agent |
|---|---|---|
| Identity | Own 3-digit ID (e.g., 002) | No ID — lives under parent |
| Bus registration | Registered, sends heartbeats | NOT registered in Bus |
| Session | Own independent session | Shares parent's session context |
| Memory | Own Hindsight bank | Uses parent's memory |
| Lifecycle | Survives parent restart | Dies with parent session |
| Orchestrator visibility | Visible — orchestrator can intervene | Invisible — parent's responsibility |
| Communication | Through Control Bus (SSE, REST) | Direct to parent via Ruflo |
| Cost tracking | Own cost ledger entries | Rolled into parent's cost |
| Model | Configured per agent | Usually Haiku (cheap) |
| Capacity | Counts against host limit | Max 3 per parent |
5A.2 Three Spawning Patterns¶
Pattern 1: Project Roster Agent (Commissioned)¶
Spawned when a project starts. Long-lived. Assigned to project roster.
pattern: project_roster
spawned_by: orchestrator or system master
when: "Project needs sustained domain expertise"
duration: entire project or sprint
lifecycle:
- orchestrator creates agent
- registers in Control Bus
- added to project roster with roles
- works on project tickets
- freed when project completes or no longer needed
examples:
- "A product project needs a compliance specialist for 2 weeks"
- "XIOPro needs a dedicated backend engineer"
visibility: full — orchestrator sees status, cost, tasks
Pattern 2: On-Demand Agent (Task-Scoped)¶
Spawned for a specific task. Medium-lived. Has own identity.
pattern: on_demand
spawned_by: orchestrator
when: "A specific task needs a dedicated agent"
duration: task duration (hours to days)
lifecycle:
- orchestrator identifies task needing dedicated agent
- checks host capacity
- spawns agent with task assignment
- agent registers in Control Bus
- works on assigned task
- reports results
- terminated when task complete
examples:
- "Research all competitors in the target domain — spawn a research agent"
- "Build the SSE endpoint — spawn a backend agent"
- "Run security audit — spawn a security agent"
visibility: full — registered, trackable, cost-attributed
Pattern 3: Sub-Agent (Ephemeral, Parent-Managed)¶
Spawned WITHIN an agent's session for parallel subtasks. Short-lived. No independent identity.
pattern: sub_agent
spawned_by: parent agent (via Ruflo claude -p)
when: "Agent needs parallel help within its own work"
duration: minutes to hours, within parent session
lifecycle:
- parent agent decides it needs parallel help
- spawns sub-agent via Ruflo (claude -p headless)
- sub-agent executes narrow task
- reports result directly to parent
- parent reviews and integrates
- sub-agent terminates
- cost rolls into parent's ledger
examples:
- "I'm coding and need tests run in parallel"
- "I need a quick code review of my current diff"
- "Fetch and summarize 5 web pages while I continue"
- "Run the DDL migration while I update the docs"
max_concurrent: 3 per parent agent
model: typically Haiku (cheapest capable model)
visibility: invisible to orchestrator — parent's responsibility
5A.3 Spawning Decision Logic¶
When the orchestrator receives a task:
flowchart TD
Task["New Task"] --> NeedAgent{"Need a new agent?"}
NeedAgent -->|"No - existing agent available"| Assign["Assign to existing agent"]
NeedAgent -->|"Yes"| Duration{"Expected duration?"}
Duration -->|"Sprint/project"| Roster["Spawn Project Roster Agent"]
Duration -->|"Days"| OnDemand["Spawn On-Demand Agent"]
Duration -->|"Hours or less"| Parent{"Can existing agent sub-agent it?"}
Parent -->|"Yes"| SubAgent["Parent spawns Sub-Agent"]
Parent -->|"No"| OnDemand
Roster --> Register["Register in Control Bus"]
OnDemand --> Register
SubAgent --> ParentManages["Parent manages internally"]
5A.4 Rules¶
- Only the orchestrator spawns agents (roster and on-demand). Agents spawn sub-agents.
- Agents count against host capacity. Sub-agents count against parent's sub-agent limit (max 3).
- Sub-agents should NEVER be used for work that needs to survive a session restart. Use on-demand agents for that.
- Cost attribution: agents get their own ledger entries; sub-agent costs roll into parent.
- The orchestrator cannot see or intervene on sub-agents. If a sub-agent is stuck, the parent agent handles it or escalates.
6. Agent Lifecycle¶
flowchart TD
Spawn --> Initialize
Initialize --> Execute
Execute --> Complete
Execute --> Fail
Fail --> Retry
Retry --> Execute
Complete --> Terminate
6.1 Heartbeat, Staleness & Orphan Cleanup¶
Heartbeat Intervals¶
| Surface | Heartbeat Interval | Protocol |
|---|---|---|
| Agents (registered via Bus) | 60 seconds | POST /agents/{id}/heartbeat |
| SSE clients | 30 seconds | SSE :ping frame or heartbeat event |
Staleness Thresholds¶
| Threshold | Duration | Agent Status | Action |
|---|---|---|---|
| Healthy | < 300 seconds since last heartbeat | online |
Normal operation |
| Stale | 300 seconds (5 minutes) | stale |
Governor emits agent.stale warning; no task reassignment yet |
| Dead | 600 seconds (10 minutes) | offline |
Agent marked offline; orphaned tasks reassigned |
Orphan Cleanup¶
The Governor runs a cleanup sweep every 60 seconds:
- Query all agents where
last_heartbeat_at < NOW() - INTERVAL '300 seconds'andstatus = 'online'-- mark asstale - Query all agents where
last_heartbeat_at < NOW() - INTERVAL '600 seconds'andstatus IN ('online', 'stale')-- mark asoffline - For agents marked
offline: find all tasks withassigned_agent_id = {agent_id}andstatus = 'in_progress'-- reset toqueuedfor reassignment - Emit
agent.offlinegovernance event withagent_id,last_heartbeat_at, and count of reassigned tasks - SSE connections that miss 3 consecutive pings (90 seconds) are closed server-side
Rules¶
- An agent recovering from
staletoonlinemust re-register viaPOST /agents/registerand reclaim its queued tasks - An agent recovering from
offlinemust re-register; previously reassigned tasks are NOT automatically returned - The Governor must not mark the master orchestrator (GO) as offline without emitting a
criticalalert
7. Session Lifecycle¶
flowchart TD
Start --> Active
Active --> Idle
Idle --> Resume
Idle --> Dream
Active --> Crash
Crash --> Recover
Recover --> Active
7.1 Context Rotation Protocol¶
The Global Orchestrator (GO) runs in long sessions that accumulate context. When context approaches capacity, GO must rotate to a fresh session without losing state.
Protocol Steps¶
context_rotation:
trigger: "Session duration > 8 hours OR context feels compressed OR many agents spawned"
steps:
1. save_state:
- Update Part 11 (Execution Log) with current session work
- Update memory files (~/.claude/projects/*/memory/)
- Push all repos to Git
2. prepare_handoff:
- Launch background restart process: nohup bash -c "sleep 5 && devxio go" &
- OR spawn a rotation agent to manage the restart
3. exit_session:
- /exit (or session ends naturally)
4. new_session_boots:
- Reads CLAUDE.md (activation protocol)
- Reads memory files (current project state)
- Reads Part 11 (what was done, what's pending)
- Reads plan.yaml (ticket status)
- Resumes from exact point
continuity:
- Agent identity persists (GO = Global Orchestrator, same role)
- State files are the bridge between sessions
- No work is lost — everything is in Git + memory + Part 11
frequency: "As needed. Typically once per 8-12 hour session."
Rule¶
Context rotation is transparent to the user. The orchestrator self-manages it. The user always talks to the same role (GO), just with fresh context.
8. Model Selection Strategy¶
Inputs¶
- task complexity
- required reasoning
- cost constraints
- latency requirements
Examples¶
| Scenario | Model |
|---|---|
| heavy reasoning | Claude Opus |
| balanced | Claude Sonnet |
| cheap execution | GPT / Gemini |
| bulk tasks | cheaper models |
9. Cost Optimization Layer¶
Managed by the Governor¶
Strategies:
- downgrade models when possible
- batch operations
- avoid redundant work
- detect runaway loops
- enforce budget limits
10. Failure Handling¶
Types¶
- agent failure
- model failure
- session crash
- incomplete task
Handling¶
- retry
- escalate
- switch model
- request human input
11. Human-in-the-Loop¶
Trigger Conditions¶
- requires_human = true
- requires_approval = true
- ambiguity detected
Flow¶
- pause execution
- open RC
- await input
- resume
12. Multi-Agent Coordination¶
- The orchestrator delegates to domain brains
- Domain brains spawn workers
- Results flow upward
- The orchestrator maintains global consistency
12.1 Agent-to-Agent Communication Protocol¶
Message Format¶
All agent-to-agent communication uses JSON messages delivered through the Control Bus bus_send_message tool:
agent_message:
id: uuid # unique message ID
from_actor: string # sender agent_id (e.g., "000")
to_actor: string # recipient agent_id or topic
topic: string # message topic / channel
type: enum # task_assignment | result | query | notification | coordination
payload: jsonb # message-specific structured content
idempotency_key: string # unique key for deduplication
correlation_id: string|null # links related messages in a conversation
created_at: datetime
Delivery Guarantee¶
- At-least-once delivery: the Bus persists all events to PostgreSQL. Agents poll with a cursor. Messages are never silently dropped.
- If an agent is offline, messages accumulate in the Bus and are delivered when the agent next polls or reconnects via SSE.
Ordering¶
- Per-topic ordering guaranteed: messages within a single topic are assigned sequential
seqnumbers by the Bus. Agents process messages inseqorder. - Cross-topic ordering is NOT guaranteed. Agents must not depend on message ordering across different topics.
Retry Behavior¶
- If a message delivery fails (agent unreachable, processing error), the sender retries up to 3 times with exponential backoff: 5s, 15s, 45s.
- After 3 failed retries, the message is marked
delivery_failedand a governance eventmessage.delivery_failedis emitted. - The Bus itself does not retry -- retry is the sender's responsibility.
Idempotency¶
- Every message must carry an
idempotency_key(typically{from_actor}:{correlation_id}:{seq}or a UUID). - Recipients must deduplicate incoming messages by
idempotency_key. Processing the same key twice must produce no side effects. - The Bus MAY deduplicate at ingestion if the same
idempotency_keyis submitted within a 5-minute window.
Acknowledgement¶
- Agents acknowledge processed messages via
bus_ackwith their cursor position. - Unacknowledged messages are re-delivered on the next poll.
13. CLI-First Principle¶
Everything must be runnable:
- via CLI
- headless
- without UI
UI is optional, execution is not.
14. Infrastructure Execution Mapping¶
Cloud (Hetzner)¶
- Orchestrator (BrainMaster)
- DB
- Ruflo
- LiteLLM
- API services
Local (Mac Studio)¶
- fallback execution
- knowledge access
- RC sessions
- future local models
15. Restart & Recovery¶
System must support:
- full restart command
- state reload
- session recovery
- task continuation
16. GitHub & Backup Integration¶
GitHub¶
- version control
- agent code
- rules
- blueprints
Backblaze B2¶
- backups
- snapshots
- disaster recovery
17. Observability¶
System must expose:
- active agents
- running tasks
- cost
- errors
- alerts
18. Security & Isolation¶
- API keys protected
- agent isolation
- permission control
- environment separation
19. Current State (v5.0 Addition)¶
19.1 Agent Identity Model¶
All agents use 3-digit numeric IDs. Roles are assigned properties, not fixed identities. See Part 1, Section 8 for the complete identity schema.
Agent-to-role assignments are project-scoped (see Part 3, Section 4.2.1 Project Agent Roster). The architecture does not hardcode which agent holds which role — that is an operational decision made per project.
ID Allocation Ranges¶
| Range | Purpose |
|---|---|
| 000-009 | Core agents (orchestrators, specialists) |
| 010-019 | Remote/external host agents |
| 020-029 | Interface agents |
| 030-099 | Reserved for future core expansion |
| 100-999 | On-demand and ephemeral agents |
Current Agent Registry (operational, not architectural)¶
The current agent registry is maintained in the Control Bus and the agents.yaml state file. It changes as projects are created and agents are commissioned or retired. See Part 11 (Execution Log) for the live registry.
19.2 Current Execution Runtime¶
- Agent spawning: Ruflo (claude-flow) on Hetzner CPX62
- Primary execution surface: Claude Code CLI on Hetzner
- Remote execution: via Tailscale to Mac Studio
- Agent communication: STRUXIO Bus (PostgreSQL-backed, evolving to Control Bus)
- Task tracking: Paperclip (to be superseded by XIOPro ODM)
- Agent identity: Unified 3-digit numbering, role-based assignment
20. Execution Success Criteria¶
Execution layer is successful if:
- tasks complete reliably
- sessions recover automatically
- cost is controlled
- agents remain coordinated
- system runs 24/7
21. Final Statement¶
This layer is the engine of XIOPro .
If this is strong:
- the system works continuously
- the founder scales beyond time
If weak:
- everything collapses into manual work
21. Error Handling Implementation Specification¶
This section closes the error handling gap identified by all three external reviewers. Part 7 defines governance policy objects and breaker types. This section specifies the concrete implementation parameters.
21.1 Retry Policy¶
21.2 Circuit Breaker Implementation Parameters¶
circuit_breakers:
cost_breaker:
threshold: "85% of budget_cap triggers warning, 100% halts non-critical agents"
evaluation_frequency: "per-activity"
loop_breaker:
threshold: "same error 3 times in sequence"
action: "halt agent, escalate to orchestrator"
failure_breaker:
threshold: "5 failed activities in 1 hour"
action: "pause agent, alert user"
memory_breaker:
threshold: "host RAM at 85% = no new agents, 90% = graceful shutdown lowest priority, 95% = emergency terminate"
evaluation_frequency: "every 60 seconds via host monitor"
21.3 Bus Down Fallback¶
bus_down_fallback:
detection: "3 consecutive failed heartbeats (3 minutes)"
agent_behavior: "continue current task locally, queue messages for retry"
recovery: "on Bus recovery, flush queued messages, re-register"
21.4 Runaway Detection¶
runaway_detection:
definition: "agent consuming >10x normal tokens for task type, or >30 minutes on a task estimated at <5 minutes"
action: "governor alerts user, pauses agent if no response in 5 minutes"
21.5 Cross-Reference¶
These parameters implement the breaker types defined in Part 7, Section 9.3 and the recovery policies in Part 7, Section 8.4. The memory breaker implements the memory pressure survival rule from Part 8, Section 11.10.3.
Changelog¶
| Version | Date | Author | Changes |
|---|---|---|---|
| 4.1.0 | 2026-03-26 | BM | Initial v4.1 release |
| 4.2.0 | 2026-03-28 | BM | Added: T1P implementation form table (4.2E). Added: Ruflo relationship to O00 clarification (4.2F). Fixed: "Rufio" renamed to "Ruflo" globally. Added: Dream Engine T1P posture -- Idle Maintenance only (4.9.9). Added: Current agent mapping table (19.1). Added: Current execution runtime state (19.2). Added: Changelog section. Updated version header to 4.2.0. |
| 4.2.1 | 2026-03-28 | BM | Unified Agent Identity Model: Reframed O00/O01/R01/P01/M01 as role bundles assigned to agents, not separate agent identities. Updated all section headers from profession codes to role names (e.g., "4.1 Orchestrator Role" instead of "4.1 O00"). Updated 4.2E table to show role bundles with agent 000 assignment. Updated 4.2F/4.2G/4.2H to use 3-digit agent IDs. Updated Section 19 agent mapping to unified 3-digit identity table with Old ID column. Updated all body text references from O00/O01/R01/P01/M01 to role-based naming. Updated all Mermaid diagrams to use 3-digit agent IDs. |
| 4.2.2 | 2026-03-28 | 000 | Agent naming migration: B1-B5 replaced with 001-005 in skill tables and activation examples. BM replaced with 000. W21-W23 replaced with 201-203 in example activation. Slim activation example updated from B2 to 002 naming. Backblaze B2 references preserved unchanged. Changelog author entries preserved as historical. |
| 4.2.3 | 2026-03-28 | 000 | Idea + User entities: Added idea_review to idle_maintenance_tasks (Section 4.10). Added Idea review to T1P Dream Engine posture (Section 4.9.9) — scan ideas not reviewed within review_cycle, surface in morning brief. |
| 4.2.4 | 2026-03-28 | 000 | Skill Selection Architecture (Section 4.11): 3-step filter (role + task type + model tier) for selecting which skills an agent loads per task assignment. Includes categorized skill library, task assignment contract with skills_required/skills_available fields, and rule steward governance connection. Updated Part 5 Section 8.9 to cross-reference. |
| 4.2.5 | 2026-03-28 | 000 | Founder clarifications: (1) Role-Topic-Skill binding chain added to Section 4.11 -- skills bind to roles via topics, not directly to agent numbers. (2) Added skill_performance_review and skill_token_optimization to idle_maintenance_tasks (Section 4.9.9). |
| 4.2.6 | 2026-03-28 | 000 | Roles over numbers: Removed agent IDs from all architectural role descriptions, section headers, diagrams, and tables. Agent numbers retained only in Section 19 (Current State) and Changelog. Blueprint now describes WHAT roles do, not WHICH agent holds them. |
| 4.2.7 | 2026-03-28 | BM | XIOPro Optimizer cross-references: Added "part of the XIOPro Optimizer (see Part 1, Section 8A)" note to Governor (4.2), Rule Steward (4.2A), Prompt Steward (4.2B), Module Steward (4.2C), and Dream Engine (4.9). |
| 4.2.8 | 2026-03-28 | BM | AGI pattern gap fixes: (1) Task Dependency Resolution (5.1) — topological sort DAG algorithm for depends_on/blocks resolution. Addresses audit gap "Workflow DAG Formalization" (Principle 21). (2) Completion Self-Check Protocol (5.2) — 5-step self-evaluation before marking tasks done, with confidence scoring and escalation rules. Addresses audit gap "Agent Self-Evaluation" (Principle 1 depth). |
| 4.2.9 | 2026-03-28 | 000 | Wave 1-2 BP fixes: Expanded Domain Skills in Section 4.11 — github- (6 skills), flow-nexus- (3 skills) now listed individually. Added Advanced/Candidate Skills table (17 skills) with review concerns. Added Full Skill Count Summary table (50 total skills across 7 categories). |
| 4.2.10 | 2026-03-28 | 000 | Memory engineering principles: Added Section 4.8A — 5 production engineering rules for memory operations (async updates, debounce writes, confidence threshold, token budget, atomic writes) from 5-Layer Memory Stack research. Includes relation table to existing architecture and implementation requirements. Slimmed Section 4.2H Control Bus to cross-reference Part 2 Section 5.8 (removed duplicated capabilities list). |
| 4.2.11 | 2026-03-29 | BM | Added Section 4.1A (Orchestrator Surface Names) — GO/MO naming convention for Hetzner and Mac orchestrator surfaces with launch commands and rules. Added Section 7.1 (Context Rotation Protocol) — session rotation procedure for long-running orchestrator sessions with state preservation via Part 11, memory files, and Git. |
| 4.2.12 | 2026-03-29 | BM | Cross-references: Added pointer to resources/DESIGN_rc_architecture.md (RC architecture — human-agent interaction surface design, Open WebUI evaluation, multi-provider routing). |
| 4.2.13 | 2026-03-29 | 000 | Batch BP update from recent tickets: Added Section 5.3 (Agent Auto-Pickup) — /agents/{id}/pickup endpoint, self-retrieval pattern, backoff schedule. Added Section 5.4 (Paperclip Auto-Sync) — fire-and-forget pattern for ODM-to-Paperclip sync on task CRUD. |
| 4.2.14 | 2026-03-30 | 000 | Reviewer role: Added Section 4.2I (Reviewer Role) — formal agent role for post-build independent review. Spawned by GO/PO after builder completes significant work; must be a different agent than the builder; uses different model tier where possible (Opus reviews Sonnet, Sonnet reviews Haiku); reads spec + output independently; returns APPROVED / NEEDS_FIX / REJECTED verdict to orchestrator; short-lived. Updated 4.2D.8 state ownership table, 4.2E T1P form table, and Section 4.11 agent_role_skills to include reviewer. |
| 5.0.1 | 2026-03-30 | GO | I4: Added Section 6.1 (Heartbeat, Staleness & Orphan Cleanup) -- heartbeat intervals (60s agents, 30s SSE), stale threshold (300s), dead threshold (600s), Governor cleanup sweep every 60s, orphaned task reassignment to queued. I7: Added Section 12.1 (Agent-to-Agent Communication Protocol) -- JSON message format via Bus, at-least-once delivery, per-topic ordering with sequential seq numbers, 3-retry exponential backoff, idempotency_key deduplication, bus_ack acknowledgement. |