XIOPro Production Blueprint v5.0¶

Part 7 — Governance, Control & Safety Layer¶

1. Purpose¶

This document defines the governance layer that keeps XIOPro:

safe
measurable
explainable
recoverable
cost-aware
human-governed when required

Governance is not a dashboard feature. It is the operational discipline that prevents XIOPro from degrading into uncontrolled autonomous activity.

Primary governance actors:

the governor
rule/policy system
breaker system
alerting system
audit/event ledger
human approval and intervention layer

2. Core Principle¶

Execution without governance becomes expensive chaos.

Governance exists to ensure that XIOPro:

stays within declared limits
reacts predictably to anomalies
preserves founder control at critical points
produces auditable decisions
improves without mutating recklessly

3. Governance Philosophy¶

XIOPro governance follows these rules:

3.1 Observe Before Acting¶

The governor should prefer:

detect
explain
recommend
constrain
intervene

before it escalates to harsher controls.

3.2 Prefer Reversible Controls¶

When possible, governance should choose actions that are reversible:

pause
throttle
reroute
constrain tool/model choice
require approval

before terminal actions such as:

cancel
quarantine
hard-stop execution surface

3.3 Human Gates Are Real State¶

Discussion and approval are not UI side effects.

They are persistent operational states that must be visible in the work graph and audit trail.

3.3A Immediate Start vs Stage Gates (Clarification)¶

XIOPro follows two rules that appear contradictory but are not:

"Never ask permission to START work" -- when an agent receives a task assignment, it begins execution immediately. Assignment IS the start signal. Agents do not wait for a second confirmation before starting.
"Require approval at stage gates" -- within a task's execution, certain checkpoints require explicit human or governance approval before proceeding. These include: protected deployments, policy mutations, cost threshold breaches, recovery decisions with business tradeoffs, and any action flagged by an escalation policy.

These rules are complementary:

Start freely: the agent begins work the moment it is assigned
Gate at checkpoints: the agent pauses at defined approval gates within the work and waits for the required decision before proceeding past the gate

An agent that refuses to start without permission is broken. An agent that skips required approval gates is dangerous.

See also Part 1, Section 4.5A for the foundational statement of this principle.

3.4 Policy Beats Mood¶

Governance actions must be based on:

policy objects
thresholds
observed state
explicit override records

not on hidden heuristics alone.

3.5 Local Optimization Must Not Break System Integrity¶

A cheaper or faster path is invalid if it violates:

safety
auditability
human gate requirements
execution continuity
architectural boundaries

4. Governance Architecture¶

flowchart TD
    Signals[Runtime Signals / Events / Metrics] --> Observe[Observe]
    Observe --> Analyze[Analyze]
    Analyze --> Policy[Evaluate Policies]
    Policy --> Decide[Decision Engine]
    Decide --> Action[Governance Action]
    Action --> Runtime[Execution Runtime]
    Action --> Alerts[Alerts / Escalation]
    Action --> Audit[Audit / Event Ledger]
    Runtime --> Signals

5. Governor Role¶

5.1 Role¶

The governor is the system-wide protection, optimization, and intervention engine.

It does not replace the orchestrator role.

The orchestrator role drives execution. The governor role protects and shapes execution.

5.2 Responsibilities¶

The governor role must:

monitor cost, health, latency, retries, failures, and drift
detect abnormal or unsafe execution patterns
enforce policy actions
open escalation requests when human input is required
recommend optimizations
record all meaningful governance actions
support recovery decisions
preserve system stability during partial failure

5.3 Non-Responsibilities¶

The governor role must not:

become the only orchestration brain
mutate core behavior silently
bypass required human approval
become a hidden routing black box
rewrite rules without governance process

5.4 Minimum Inputs¶

The governor role must consume at least:

activity events
task state changes
session state changes
runtime health signals
cost estimates / actuals
retry counts
alert-worthy anomalies
breaker hits
human decisions
rule/policy versions

5.5 Minimum Outputs¶

The governor role must be able to emit:

recommendation
warning
throttle
pause
reroute
require_approval
require_discussion
quarantine_runtime
cancel_execution
recovery_plan
policy_violation_event

6. Governance Domains¶

6.1 Cost Governance¶

Purpose:

attribute spend
detect abnormal cost growth
enforce budget rules
prevent wasteful loops and overpowered model use

Minimum scope:

activity
task
ticket
project
runtime surface
environment
provider/model
node/service where practical

6.2 Runtime & Health Governance¶

Purpose:

detect degraded execution
detect repeated retry failure
detect session instability
prevent cascading outages
support recovery selection

Minimum monitored conditions:

service down
degraded performance
repeated crash/restart
repeated session recovery failure
queue buildup
stale runtime heartbeat

6.3 Behavior & Policy Governance¶

Purpose:

enforce architecture and rule boundaries
prevent agent drift
prevent forbidden actions or tool use
ensure execution remains bounded and explainable

Examples:

invalid tool attempted
forbidden surface selected
missing approval on protected action
excessive recursion
repeated non-productive loop
unauthorized write target

6.4 Quality Governance¶

Purpose:

detect unusable or low-confidence outputs
force rework when quality gates fail
prevent low-quality outputs from being treated as "done"

Examples:

output missing required artifact
test/validation failure
confidence below threshold
contradictory result detected
evaluation score below policy

6.5 Human Control Governance¶

Purpose:

preserve founder control over high-risk decisions
turn critical discussions into durable system state

Cases that should open a human gate:

policy says approval required
ambiguity blocks progress
cost/risk exceeds delegated limit
recovery path has business tradeoffs
rule/system mutation is proposed

6.6 Security & Access Governance¶

Purpose:

prevent privilege misuse
limit secret exposure
constrain runtime access paths
record security-sensitive intervention

Examples:

unexpected secret access attempt
runtime requests forbidden infrastructure action
public exposure drift
operator path policy violation

7. Governance Signal Model¶

7.1 Core Signal Classes¶

XIOPro governance should reason over at least these signal classes:

event signals
metric signals
health signals
policy signals
human signals
cost signals
quality signals
recovery signals

7.2 Event Examples¶

Examples:

task.started
task.blocked
session.crashed
session.recovered
runtime.heartbeat_missed
cost.spike_detected
breaker.triggered
approval.required
human.decision.recorded

7.3 Correlation Keys¶

Governance events should be correlatable by:

project_id
ticket_id
task_id
activity_id
agent_runtime_id
session_id
execution_surface_id
policy_id
event_id

8. Policy Objects¶

Governance must not live only in prose. It requires policy objects.

8.1 Budget Policy¶

Defines financial limits and allowed responses.

id: string
scope_type: enum
# system | project | ticket | task | runtime_surface

scope_ref: string|null
period: enum
# run | hour | day | week | month

warning_threshold: float|null
hard_threshold: float|null

preferred_action: enum
# warn | throttle | reroute | pause | escalate

created_at: datetime
updated_at: datetime

8.2 Breaker Policy¶

Defines when breakers should trigger.

id: string
breaker_type: enum
# cost | loop | failure | recursion | quality | security | latency | recovery

scope_type: enum
scope_ref: string|null

trigger_condition: string
cooldown_seconds: int|null
auto_resume_allowed: boolean

warning_action: enum|null
hard_action: enum

created_at: datetime
updated_at: datetime

8.3 Escalation Policy¶

Defines when a human gate must open.

id: string
trigger_type: enum
# approval | discussion | anomaly | recovery_tradeoff | policy_violation

scope_type: enum
scope_ref: string|null

urgency: enum
# low | normal | high | critical

default_owner: string|null
sla_minutes: int|null

created_at: datetime
updated_at: datetime

8.4 Recovery Policy¶

Defines preferred recovery responses.

id: string
failure_class: enum
# provider | runtime | session | quality | infra | security

preferred_path: enum
# retry_same_session | resume_new_session | switch_surface | switch_model | escalate_human | terminal_fail

max_attempts: int|null
cooldown_seconds: int|null
approval_required: boolean

created_at: datetime
updated_at: datetime

8.5 Routing Constraint Policy¶

Defines allowed execution choices under governance pressure.

id: string
scope_type: enum
scope_ref: string|null

allowed_surfaces: [string]
allowed_models: [string]
forbidden_tools: [string]
cost_tier_cap: string|null

created_at: datetime
updated_at: datetime

9. Circuit Breakers¶

9.1 Purpose¶

Circuit breakers prevent XIOPro from continuing harmful execution merely because execution is technically still possible.

9.2 Breaker Principles¶

A breaker must have:

owner
scope
trigger condition
action
cooldown/recovery rule
audit event
override rule

9.3 Breaker Types¶

Breaker Type	Trigger Pattern	Typical Action
cost breaker	spend crosses policy threshold	throttle / pause / escalate
loop breaker	repeated low-progress iteration	pause / reroute / escalate
failure breaker	repeated execution failure	recover / quarantine / escalate
recursion breaker	unbounded delegation depth	stop child spawning / pause
latency breaker	task/session exceeds runtime expectation	warn / reroute / escalate
quality breaker	repeated low-quality or invalid output	rework / approval / block release
recovery breaker	too many failed recoveries	require human decision
security breaker	forbidden or suspicious access/action	quarantine / hard stop / alert

9.4 Breaker State Model¶

breaker_state:
  - monitoring
  - warning
  - triggered
  - cooldown
  - resumed
  - overridden
  - retired

9.5 Breaker Action Set¶

Allowed actions:

warn
throttle
constrain
pause_task
pause_runtime
pause_surface
reroute_model
reroute_surface
require_discussion
require_approval
quarantine_runtime
cancel_task
hard_stop_surface

9.6 Breaker Flow¶

flowchart TD
    Monitor --> ThresholdBreached
    ThresholdBreached --> Warning
    Warning --> Triggered
    Triggered --> GovernanceAction
    GovernanceAction --> Cooldown
    Cooldown --> ResumeCheck
    ResumeCheck --> Resumed
    ResumeCheck --> EscalateHuman

9.7 T1P Breaker Baseline¶

T1P must include at least:

one cost breaker
one loop breaker
one repeated-failure breaker
one session-recovery breaker
one approval-required breaker for protected actions

10. Alerts & Intervention¶

10.1 Severity Levels¶

Use at least:

critical
warning
info

10.2 Severity Meaning¶

Critical¶

Used when immediate intervention or automatic protection is required.

Examples:

database unavailable
orchestrator/governor unavailable
repeated failed recovery
security-sensitive violation
runaway cost event

Warning¶

Used when the system remains functional but risk is rising.

Examples:

high retry rate
queue backlog
degraded runtime health
failed backup job
unstable execution surface

Info¶

Used for state visibility without urgent action.

Examples:

deployment complete
breaker resumed
scheduled Dream cycle complete
low-risk optimization suggestion

10.3 Alert Routing¶

Alerts should be routable to:

dashboard/control center
operator inbox / queue
chat/notification channel
incident/audit record

10.4 Multi-Project Alert Routing¶

In multi-project deployments, alerts must carry a project_id correlation key so that routing is project-scoped.

Rules:

Every alert event must include project_id where the alert originates from project-scoped execution (ticket, task, agent runtime, PO)
IO filters alerts presented to a human user by that user's project access list — a user with access to Project A must not see alerts from Project B
System-level alerts (infrastructure, security, governance) carry project_id: null and are routed to all operators with system-level access
Alert queries to the Bus must accept an optional project_id filter parameter
The Dashboard alert feed must respect per-user project access when rendering the alert list

This rule applies to all alert severity levels (critical, warning, info) and all delivery surfaces (dashboard feed, operator inbox, notification channel, incident record).

10.5 Intervention Ownership¶

Every actionable alert should identify:

system owner
affected scope
recommended action
whether auto-action already occurred
whether human decision is still required

10.6 Acknowledgement Rule¶

Critical and warning alerts should support acknowledgement and closure state.

alert_state:
  - open
  - acknowledged
  - silenced
  - resolved
  - archived

11. Decision Engine¶

11.1 Inputs¶

The decision engine evaluates:

policy objects
current runtime state
recent event history
health state
cost state
human decision state
recovery eligibility
architectural constraints

11.2 Outputs¶

The decision engine may return:

continue
continue_with_constraints
pause
reroute
retry
recover
escalate_discussion
escalate_approval
quarantine
cancel

11.3 Decision Requirements¶

Every non-trivial governance decision should be:

attributable
explainable
logged
correlated to triggering signals
replayable from stored evidence

11.4 Explainability Contract¶

A governance decision record should include:

decision_id: string
decision_type: string
scope_type: string
scope_ref: string
trigger_signals: [string]
policy_refs: [string]
recommended_action: string
executed_action: string
requires_human: boolean
created_at: datetime

12. Rule, Skill, Activation, ContextPrompting & Module Portfolio Governance¶

12.1 Purpose¶

This section governs the behavior-shaping and module-selection layer of XIOPro.

It covers:

rules
skills
activations
protocols
patterns
ContextPrompting behavior
assumption and inquiry discipline
module portfolio policy
subscription and hosting governance
module optimization policy

These assets materially shape:

how agents think
how tasks are framed
which modules are used
how assumptions are made
how execution quality evolves over time
how constrained resources are allocated

12.2 Governed Asset Classes¶

The following asset families must be governed.

These asset classes are aligned with the ODM entity model defined in Part 3. Any addition or change to governed asset classes must be reflected in both Part 3 (ODM schema) and Part 7 (governance policy).

Core ODM-Aligned Asset Classes¶

These correspond directly to ODM entities and must use the same naming, lifecycle, and schema definitions established in Part 3:

DiscussionThread
Ticket
Task
AgentRuntime
Session
Activity
EscalationRequest
HumanDecision
OverrideRecord
ResearchTask
CostLedger
TimeLedger

Behavior-Shaping Asset Classes¶

These govern how agents think and operate:

RULE_*
SKILL_*
ACTIVATION_*
PATTERN_*
PROTOCOL_*
ContextPrompting mode policies
inquiry policies
assumption policies
prompt package templates

Module Portfolio Asset Classes¶

These govern module selection and optimization. See Part 5, Section 7.14 for the full schema definitions of each module asset class (MODULE, MODULE_POLICY, SUBSCRIPTION, HOSTING_PROFILE, MODULE_EVALUATION, MODULE_RECOMMENDATION).

Taxonomy Alignment Rule¶

The governed asset taxonomy in Part 7 must match the ODM definitions in Part 3 exactly. If a new entity is added to the ODM, it must be evaluated for governance coverage. If a governed asset class references an ODM entity, the names, lifecycle states, and schema properties must be consistent across both parts.

12.3 Governance Roles¶

The full specifications of each governance role are defined in Part 4:

Rule Steward Role: See Part 4, Section 4.2A for the full specification (responsibilities, non-responsibilities, managed asset classes, operating modes, stewardship flow).
Prompt Steward Role: See Part 4, Section 4.2B for the full specification (ContextPrompting modes, question budget, readiness decisions, prompt package contract).
Module Steward Role: See Part 4, Section 4.2C for the full specification (governed asset classes, optimization objective, evidence sources, telemetry requirements).
Governor Role: See Part 4, Section 4.2 and Part 7, Section 5 for the full specification.

Within the governance layer, these roles interact as follows:

The governor role constrains runtime behavior when governance policy requires approval, discussion, question budget restrictions, module usage constraints, or portfolio enforcement under anomaly or cost pressure.
The governor role does not replace the rule steward, prompt steward, or module steward roles, but it may constrain them at runtime.

12.4 Source of Truth Model¶

See Part 5, Section 7.5 for the full dual-representation technology model (human-readable Markdown source of truth + structured runtime mirror in YAML/DB).

The governance layer operates across both forms: the Markdown assets for human review and approval, the structured mirror for machine-evaluable validation, conflict detection, and policy enforcement.

12.5 Rule Priority & Precedence¶

When governed assets conflict, priority should normally be:

security / safety
human approval requirement
hard governance breaker
architectural boundary rules
execution continuity / recovery requirements
module portfolio constraints
ContextPrompting inquiry requirements
rule / activation constraints
skill guidance
convenience / preference rules

No lower-priority asset may silently override a higher-priority one.

12.6 Change Governance¶

Any change to a governed asset that can alter execution behavior should support:

versioning
explanation
structured diff visibility
approval determination
rollback path
supersession lineage
audit event generation

Protected changes must not be auto-published.

Typical protected changes include:

core runtime rules
security-sensitive rules
core agent activations
ContextPrompting policy changes
changes that materially alter approval/inquiry behavior
changes that alter default assumptions
new module adoption into approved portfolio
subscription additions or changes
self-hosting adoption proposals
retirement of strategically important modules

12.7 ContextPrompting Governance¶

ContextPrompting modes, defaults, question budgets, and the prompt package contract are fully specified in Part 4, Section 4.2B (Prompt Steward Role).

This section defines the governance constraints on ContextPrompting behavior.

Governance Rule¶

The system should ask the fewest questions that materially improve the work.

It must avoid two failures:

asking too little and making damaging assumptions
asking too much and creating friction/noise

12.8 Assumption & Inquiry Policy¶

The readiness decision taxonomy and question classification (blocking vs optional) are specified in Part 4, Section 4.2B (Prompt Steward Role, "Prompting Readiness Decision" and "Inquiry Output Classes").

Governance constraint: if an answer materially affects execution, it must be converted into durable operational state.

12.9 Module Portfolio Governance¶

The Module Steward's full specification (optimization objective, constrained resources, portfolio decisions, evidence sources, telemetry requirements) is defined in Part 4, Section 4.2C.

This section defines the governance constraints on module portfolio decisions.

12.10 Module Usage Constraints¶

Governance may constrain module usage when:

policy restricts certain task classes
subscription limits are reached or near exhaustion
cost pressure requires lower tier selection
latency or stability degradation is observed
privacy policy forbids a provider path
hosting profile cannot safely support a proposed runtime
fallback path must be activated

Module constraints may include:

allowed module list
forbidden module list
required fallback
required approval
hosting-only restrictions
environment-specific restrictions

12.11 Module Evidence, Telemetry & Attribution Governance¶

The Module Steward's telemetry and attribution requirements are fully specified in Part 4, Section 4.2C ("Runtime Feedback & Telemetry Requirement"). The infrastructure telemetry pipeline is specified in Part 8, Section 8.14.

Governance constraint: if a task used a module and XIOPro cannot answer which module, through which access path, by which runtime, for which task/ticket, with what cost signal -- then module optimization and cost governance are incomplete.

Governance must distinguish between API-backed, subscription-backed, self-hosted, and hybrid/fallback use because each carries different evidence quality and optimization implications.

12.12 Search-Before-Create / Search-Before-Adopt Rules¶

The search-before-create discipline for rules, skills, and activations is specified in Part 4, Section 4.2A (Rule Steward Role) and Part 5, Section 7.8 (Skill Discovery & Creation Loop).

Before adopting a new:

module
subscription
self-hosted runtime
hosting profile

the system must first search the existing portfolio and recommendation history.

Allowed outcomes:

reuse existing
extend existing
supersede existing
compare further
create/adopt new because the gap is real

External Scouting Sources¶

The system may use governed external scouting sources such as:

provider documentation
approved benchmark material
approved web research results
Hugging Face model and repository research
local or remote CLI research tools
internal research-center outputs

These sources may inform comparison and recommendation, but they do not bypass approval or portfolio evaluation.

12.13 Validation, Publication & Enforcement Cases¶

The validation pipeline for governed assets (existence search, schema validation, metadata validation, conflict detection, effectiveness review, approval, publication, indexing) is fully specified in Part 5, Section 7.6 with flow diagram.

For governance purposes, the pipeline adds one additional step: optimization review (step 6 below) for module-related assets.

Extended governance pipeline: existence search -> schema validation -> metadata validation -> conflict detection -> effectiveness review -> optimization review (module assets) -> approval determination -> publication -> indexing / runtime mirror refresh.

Governance Enforcement Cases¶

Governance may require ContextPrompting behavior in cases such as:

high-risk task
approval-sensitive task
repeated failure caused by ambiguity
repeated rework caused by weak assumptions
founder-design collaboration mode
recovery decision with business tradeoffs

Governance may require module portfolio enforcement in cases such as:

runaway cost event
subscription exhaustion
provider instability
self-hosted environment overload
forbidden provider path
repeated poor quality from a module choice
strong evidence that a preferred module is no longer optimal
discovery of a strategically relevant new candidate that merits comparison

Final Enforcement Rule¶

External discovery may trigger evaluation.

Only governed evidence and approval may trigger adoption.

12.14 Override Record Schema¶

Override records are a core governance mechanism that must be durable and auditable.

The override record schema in Part 7 must match the ODM definition in Part 3 and the executable DDL in resources/SCHEMA_walking_skeleton_v4_2.sql.

Override Record Properties¶

override_record:
  id: uuid
  scope_type: enum
    # task | agent_runtime | session | escalation_request | module_route | policy
  scope_ref: uuid
  trigger_ref: string|null

  override_type: enum
    # pause | resume | constrain | reroute | force_module | force_surface | bypass | cancel

  reason: text           # Why the override was issued
  issued_by: text        # Actor who issued the override
  approved_by: text|null # Actor who approved (if separate from issuer)

  prior_state_ref: text|null     # Reference to state before override
  applied_change_ref: text|null  # Reference to change applied

  expires_at: datetime|null
  resolved_at: datetime|null

  # Metadata contract (standard across all ODM entities)
  tags: [string]
  labels: [string]
  source_system: string|null
  source_ref: string|null
  correlation_id: string|null
  idempotency_key: string|null
  notes: text|null

  created_by: text
  created_at: datetime

Override Governance Rules¶

Override records are append-only. They must never be deleted or silently mutated.
Every override must have a reason and an issued_by actor.
Overrides with approved_by set indicate two-party approval was obtained.
The prior_state_ref should capture enough context to understand what changed.
Overrides on security-sensitive scopes must generate a governance audit event.
Expired overrides (expires_at in the past) must not remain silently active.

12.15 Audit Requirements¶

Meaningful changes or enforcement actions in this section should create auditable events such as:

rule.proposed
skill.proposed
activation.review_required
contextprompting.mode_selected
contextprompting.blocking_question_opened
contextprompting.answer_promoted
module.evaluated
module.recommendation.created
module.policy.changed
subscription.proposal.opened
hosting_profile.assessed
portfolio.optimization_decision.recorded
governed_asset.publication_blocked
override.issued
override.approved
override.expired
override.resolved

12.16 Document Update Governance¶

Documents (BP parts, design docs, specs) may change at three stages: 1. Before ticketing — during design and research 2. During ticketing — as details are finalized 3. After execution — as implementation reveals reality

Rule¶

Executing agents record document impacts in their ticket notes (BP_IMPACT: Part N Section M — description). GO batch-updates documents after ticket completion. This prevents merge conflicts and ensures consistency.

Document Update Flow¶

agent_executes_ticket:
  records: "BP_IMPACT: Part 6 Section 10.1 — alerts now have priority filters"

go_batch_update:
  trigger: "Ticket batch complete (e.g., end of sprint, end of session)"
  actions:
    - Collect all BP_IMPACT notes from completed tickets
    - Update each referenced BP section
    - Single commit per batch
    - Push to Git
    - Update Part 12 (Ticket Register) if new capabilities added

12.17 Review Gates for Non-Code Outputs¶

Review and testing must not be limited to code. All significant XIOPro outputs pass through review gates appropriate to their type.

review_gates:
  blueprint_change:
    reviewer: human or senior agent
    validation: consistency check
    gate: approval for protected sections
  knowledge_vault_entry:
    reviewer: Librarian
    validation: metadata schema, duplicate check
    gate: auto-approve if valid, flag if duplicate
  rule_or_skill_change:
    reviewer: rule steward
    validation: conflict detection
    gate: approval required
  research_output:
    reviewer: Research Center
    validation: source lineage
    gate: auto-approve if lineage complete
  configuration_change:
    reviewer: governor
    validation: diff review, dry-run
    gate: health check after apply

This section implements the "Review and Testing Everywhere" constraint defined in Part 1, Section 4.14.

12.18 Agent Review and Test Regime¶

XIOPro enforces a mandatory 3-layer review and test regime for all agent-produced outputs. The full specification is in rules/RULE_review_test_regime.md.

Overview¶

Every non-trivial output passes through three layers before being marked complete:

Layer 1 — Self-Verification (every agent)

The producing agent verifies its own output before declaring it done:

Output matches the ticket requirements
No obvious errors, missing artifacts, or broken references
Self-verification result is recorded in the task notes

Layer 2 — Peer Review (separate agent, different model)

A second agent — distinct from the builder — reviews the output:

Builder and reviewer must always be different agents
Where practical, builder and reviewer should use different models (e.g., Sonnet builder, Haiku reviewer)
Reviewer checks correctness, completeness, and adherence to architecture rules
Reviewer records a pass/fail verdict and any required rework items
Output must not advance to integration until Layer 2 passes

Layer 3 — Integration Test (every 5 completions)

After every 5 task completions within a session or sprint, an integration check runs:

Verifies that recently completed outputs remain coherent with each other
Runs any available automated tests covering the changed scope
Records the result as a governance audit event
If integration fails, the failing task(s) are reopened and Layer 1/2 re-run

Enforcement Rules¶

No agent may self-approve its own work at Layer 2. Builder and reviewer are always separate.
Layer 2 peer review is not optional. It applies to all ticket-driven outputs, including documents, schema changes, rules, and code.
Layer 3 integration cadence (every 5 completions) is a minimum. GO may trigger it more frequently under high-risk conditions.
Governance breakers may force a Layer 3 integration test on demand.

Cross-Reference¶

See rules/RULE_review_test_regime.md for the full operative specification, including agent assignment rules, model routing guidance, cadence configuration, and audit event definitions.

12.19 Final Rule¶

Behavior-shaping assets and module portfolio assets are operational assets.

They must not drift unmanaged, and ContextPrompting must not become a hidden prompt trick.

XIOPro should treat:

rules
skills
activations
inquiry behavior
prompting mode selection
module choice
subscription choice
self-hosting proposals

as governed, explainable, approval-aware system behavior.

13. Application Security Specification¶

This section closes the application-level security gap identified by all three external reviewers. Infrastructure-level security (Tailscale, UFW, SOPS) is covered in Part 8. This section specifies application-level authentication, authorization, and inter-agent security.

13.1 API Authentication¶

api_authentication:
  method: "Bearer token (JWT or opaque)"
  token_source: "Bus OAuth 2.1 token endpoint"
  token_lifetime: "24 hours"
  refresh: "via refresh_token grant"

13.2 Control Center Authentication¶

control_center_auth:
  method: "Session cookie from Bus OAuth flow"
  flow: "User -> Bus /oauth/authorize -> redirect -> session cookie"
  session_lifetime: "24 hours"

13.3 SSE Channel Authentication¶

sse_channel_auth:
  method: "Session-bound SSE via HTTP handshake"
  flow:
    1_handshake: "POST /events/sessions with Bearer token in Authorization header"
    2_response: "Returns opaque session_id (UUID, not the token) with TTL"
    3_connect: "GET /events/{agent_id}?session={session_id}"
    4_validate: "Bus validates session_id maps to agent_id and is not expired"
  security_rules:
    - "Tokens MUST NOT appear in query parameters (they leak into server logs, browser history, and referrer headers)"
    - "Session IDs are opaque, short-lived (1 hour), and bound to the originating agent_id"
    - "Session IDs are single-use per SSE connection — reconnection requires a new handshake"
    - "Bearer token is only transmitted in the Authorization header of the handshake POST"

13.4 RBAC Enforcement¶

rbac_enforcement:
  roles: [founder, operator, reviewer, observer]
  rules:
    founder: "all operations"
    operator: "read/write tasks, agents, tickets. No project deletion."
    reviewer: "read all, write comments/evaluations only"
    observer: "read only"
  enforcement: "Middleware checks user role on every API request"

13.4.1 Per-Project Access Control¶

RBAC is enforced at the project level. Each user-project assignment carries a role:

Founder: full access across all projects and system-level operations
Operator: can manage tasks, agents, tickets, and sprints within their assigned projects only. Cannot access or modify resources in projects they are not assigned to. Cannot delete projects.
Reviewer: read access to all resources within assigned projects. Can write comments, evaluations, and review verdicts only.
Observer: read-only access to assigned projects. No write operations.

user_project_access:
  user_id: string
  project_id: string
  role: enum  # founder | operator | reviewer | observer
  granted_by: string
  granted_at: datetime

13.4.2 Agent Role Scoping¶

Agents inherit permissions from the project they are operating within:

An agent spawned for a project operates with the permission set of the role assigned to the spawning user or orchestrator for that project
Agents cannot access resources outside their assigned project scope
The orchestrator (GO) operates with founder-level access across all projects
Project Orchestrators (PO) operate with operator-level access within their project

13.4.3 Permission Matrix¶

Action	Founder	Operator	Reviewer	Observer
Spawn agents	Yes	Yes (own projects)	No	No
Terminate agents	Yes	Yes (own projects)	No	No
Approve interventions	Yes	Yes (own projects)	No	No
Reject interventions	Yes	Yes (own projects)	No	No
Modify governance rules	Yes	No	No	No
Modify policies	Yes	No	No	No
Access cost data (system)	Yes	No	No	No
Access cost data (project)	Yes	Yes (own projects)	Yes (own projects)	No
Create/modify tickets	Yes	Yes (own projects)	No	No
Write comments/evaluations	Yes	Yes (own projects)	Yes (own projects)	No
View execution data	Yes	Yes (own projects)	Yes (own projects)	Yes (own projects)
Issue tokens	Yes	No	No	No
Revoke tokens	Yes	No	No	No
Delete projects	Yes	No	No	No

13.5 Inter-Agent Authentication¶

inter_agent_auth:
  method: "Agent registration token from Control Bus"
  pattern: "Agent registers with Bus, receives agent_token for subsequent calls"

13.6 Token Lifecycle & Rotation Policy¶

token_lifecycle:
  agent_tokens:
    lifetime: "24 hours from issuance"
    auto_refresh:
      trigger: "heartbeat when token age > 20 hours"
      method: "Bus issues new token in heartbeat response; old token remains valid for 5 minutes (grace period)"
    revocation:
      single_agent: "DELETE /agents/{id}/token — immediate invalidation"
      all_agents: "POST /tokens/revoke-all — revokes all active agent tokens"
    compromised_token_procedure:
      1: "POST /tokens/revoke-all — revoke every active agent token"
      2: "All agents must re-register via POST /agents/register with fresh credentials"
      3: "Governance audit event: token.compromised.mass_revocation"
      4: "Incident record created in state/incidents.yaml"

  user_tokens:
    lifetime: "24 hours"
    refresh: "via OAuth 2.1 refresh_token grant"
    revocation: "DELETE /sessions/{session_id} or POST /tokens/revoke-user"

  sse_session_ids:
    lifetime: "1 hour"
    refresh: "new handshake required after expiry"
    revocation: "automatic on token revocation (session bound to token)"

Token Rotation Rules¶

No token may live longer than 24 hours without refresh
Heartbeat-based auto-refresh ensures agents with healthy heartbeats never experience token expiry during normal operation
Token revocation takes effect within 1 request cycle — the Bus rejects the revoked token on the next API call
Mass revocation (revoke-all) is a governance-level action that generates a critical alert

13.7 T1P Security Baseline¶

T1P must implement at minimum:

API bearer token authentication on all Control Bus REST endpoints
Session-based authentication for the Control Center UI
SSE channel authentication via session-bound handshake (no tokens in query parameters)
RBAC middleware enforcing founder/operator/reviewer/observer roles
Agent registration token for inter-agent communication
Token lifecycle: 24-hour expiry with heartbeat-based auto-refresh
Single-agent and mass token revocation endpoints

14. Audit / Event Logging Specification¶

14.1 Purpose¶

All sensitive and governance-relevant actions must be logged to an append-only audit_log table for accountability, forensics, and compliance.

14.2 Logged Actions¶

The following action classes must produce audit log entries:

Intervention: create, approve, reject, expire
Agent: spawn, terminate, register, deregister
Approval gates: open, approve, reject, timeout
Cost threshold: warning triggered, hard threshold breached
Token: issue, revoke, refresh
Override: issued, approved, expired, resolved
Policy: created, modified, retired
Security: authentication failure, forbidden access attempt, secret access
Governance: breaker triggered, breaker resumed, breaker overridden

14.3 Audit Log Schema¶

audit_log:
  id: uuid
  timestamp: datetime         # UTC, millisecond precision
  actor: string               # user_id, agent_id, or system identifier
  actor_type: enum            # user | agent | system | governor
  action: string              # e.g., "intervention.approve", "agent.spawn"
  target_type: string|null    # e.g., "agent_runtime", "escalation_request"
  target_id: string|null      # ID of the affected entity
  project_id: string|null     # project scope (null for system-level actions)
  metadata: jsonb             # action-specific details (parameters, prior state, etc.)
  ip_address: string|null     # source IP where available
  correlation_id: string|null # links related audit entries
  created_at: datetime

14.4 Retention and Storage¶

Minimum retention: 90 days in the primary audit_log table (hot storage)
Archive: after 90 days, export to cold storage (Backblaze B2 or equivalent) in JSONL format
Immutability: audit log rows are append-only. No UPDATE or DELETE operations permitted.
Index strategy: indexes on (actor, timestamp), (target_type, target_id), (project_id, timestamp), and (action, timestamp)

14.5 Access Rules¶

Only founder-role users may query the full audit log
Operator-role users may query audit entries scoped to their assigned projects
Audit log access itself generates an audit entry

15. Current State¶

As of 2026-03-28, the governance layer exists primarily in blueprint form.

What exists today:

The BrainMaster operates with a 3-failure circuit breaker (hardcoded in CLAUDE.md activation)
Cost awareness exists as manual tracking, not automated telemetry
Approval gates exist informally via founder/Face interaction, not as durable state
No formal policy objects exist in database yet
No formal audit event ledger exists yet
Override records are not yet tracked durably

What must be built:

Policy objects as database-backed records
Breaker system with real trigger/cooldown state
Audit event ledger in PostgreSQL
Override record table (DDL exists in SCHEMA_walking_skeleton_v4_2.sql)
Alert routing beyond ad-hoc bus messages
Formal approval workflow with state persistence

Changelog¶

Version	Date	Author	Changes
4.1.0	2026-03-27	BM	Initial governance blueprint
4.2.0	2026-03-28	BM	C7.1: Aligned governed asset taxonomy with ODM Part 3 entity list. C7.2: Added override record schema matching Part 3/DDL. CX.1: Fixed "Rufio" to "Ruflo" references (none present in Part 7). CX.2: Updated version header to 4.2.0. CX.3: Added changelog. CX.4: Added current state section (Section 13). Renumbered 12.14 Audit to 12.15, 12.15 Final Rule to 12.16. Added override audit events to 12.15.
4.2.2	2026-03-28	000	Agent naming migration: O01 replaced with "governor role (000)" or "000 (governor role)". R01/P01/M01 replaced with role-based naming (rule steward, prompt steward, module steward). O00 replaced with "000 (orchestrator role)". BM replaced with 000 (BrainMaster). C0 replaced with 020 (Face). All within governance context. Changelog author entries preserved as historical.
4.2.3	2026-03-28	000	Roles over numbers: Removed agent IDs from governance section headers, actor lists, and current state descriptions. Role names used throughout instead of agent numbers.
4.2.8	2026-03-28	000	Added Section 12.16: Review Gates for Non-Code Outputs -- review gates for blueprints, knowledge vault entries, rules/skills, research outputs, and configuration changes. Implements Part 1 constraint 4.13. Renumbered 12.16 Final Rule to 12.17.
4.2.9	2026-03-28	000	Updated Part 1 cross-reference: Section 12.16 now references Part 1 Section 4.14 (renumbered from 4.13 due to Control Bus constraint insertion).
4.2.10	2026-03-28	000	Content deduplication: Section 12.3 Governance Roles — replaced duplicated role descriptions with cross-references to Part 4 Sections 4.2A/4.2B/4.2C. Section 12.4 Source of Truth — replaced with cross-reference to Part 5 Section 7.5. Section 12.7 ContextPrompting — replaced duplicated modes/defaults with cross-reference to Part 4 Section 4.2B. Section 12.8 Assumption & Inquiry — replaced with cross-reference to Part 4 Section 4.2B. Section 12.9 Module Portfolio — replaced duplicated optimization objective with cross-reference to Part 4 Section 4.2C. Section 12.11 Module Evidence — replaced with cross-references to Part 4 Section 4.2C and Part 8 Section 8.14. Section 12.12 Search-Before-Create — replaced duplicated rule/skill search with cross-reference to Part 4 Section 4.2A and Part 5 Section 7.8. Section 12.13 Validation Pipeline — replaced duplicated pipeline with cross-reference to Part 5 Section 7.6. Module Portfolio Asset Classes — replaced with cross-reference to Part 5 Section 7.14.
4.2.11	2026-03-29	000	External review fix: Added Section 13 (Application Security Specification) -- API auth, Control Center auth, SSE channel auth, RBAC enforcement, inter-agent auth. Renumbered old Section 13 (Current State) to Section 14. Closes security gap flagged by all 3 external reviewers.
4.2.13	2026-03-29	000	Document Update Governance: Added Section 12.16 (Document Update Governance) -- BP_IMPACT note protocol, GO batch-update flow, conflict prevention rules. Renumbered old 12.16 Review Gates to 12.17, old 12.17 Final Rule to 12.18.
5.0.1	2026-03-30	GO	Added Section 12.18 (Agent Review and Test Regime) -- 3-layer review+test regime: Layer 1 Self-Verification (every agent), Layer 2 Peer Review (separate agent, different model; builder and reviewer always different), Layer 3 Integration Test (every 5 completions). Cross-reference to RULE_review_test_regime.md. Renumbered old 12.18 Final Rule to 12.19.
5.0.2	2026-03-30	GO	N9: Added Section 10.4 (Multi-Project Alert Routing) -- alerts carry project_id, IO filters by user project access, system-level alerts use null project_id, dashboard alert feed respects RBAC. Renumbered old 10.4 Intervention Ownership to 10.5, old 10.5 Acknowledgement Rule to 10.6.
5.0.3	2026-03-30	GO	C6: Replaced SSE query parameter token transport (Section 13.3) with session-bound handshake — POST handshake with Bearer header returns opaque session_id, no tokens in URLs. C7: Added Section 13.6 (Token Lifecycle & Rotation Policy) — 24h agent token expiry, heartbeat auto-refresh at 20h, single/mass revocation, compromised token procedure. Updated Section 13.7 (T1P Security Baseline) to include SSE handshake auth and token lifecycle requirements.
5.0.4	2026-03-30	GO	I1: Expanded RBAC spec (Sections 13.4.1-13.4.3) -- per-project access control, agent role scoping, full permission matrix for founder/operator/reviewer/observer. I2: Added Section 14 (Audit/Event Logging Specification) -- audit_log schema, logged action classes, 90-day retention, cold storage archive, access rules. Renumbered old Section 14 (Current State) to Section 15. I12: Added Section 3.3A (Immediate Start vs Stage Gates) -- clarifies that "never ask permission to START" and "require approval at gates" are complementary, not contradictory.