Skip to content

XIOPro Production Blueprint v5.0

Part 5 — Knowledge System


1. Purpose of This Part

This document defines the Knowledge System of XIOPro:

  • how knowledge is stored
  • how it is structured
  • how it is retrieved
  • how it evolves
  • how agents learn
  • how the system avoids entropy

This layer transforms XIOPro from:

an execution system

into:

a compounding intelligence system


2. Knowledge System Philosophy

XIOPro knowledge is:

  • structured (not raw memory)
  • indexed (not buried in chats)
  • evolving (not static)
  • agent-accessible (not human-only)
  • cost-efficient (token-aware)

3. Core Components

flowchart TD
    subgraph Sources["Sources"]
        Users["Users"]
        AgentOutput["Agents"]
        External["External"]
        DreamOut["Dream Engine"]
    end

    subgraph RC["Research Center"]
        SourceReg["Source Registry"]
        ResourceReg["Resource Registry"]
        Scheduled["Scheduled Research"]
        NLM["NotebookLM"]
        Obsidian["Obsidian"]
    end

    subgraph Core["Core Engine"]
        Librarian["Librarian"]
        SkillReg["Skill Registry"]
        SkillPerf["Skill Performance DB"]
        KLedger["Knowledge Ledger"]
    end

    subgraph Store["Storage"]
        Git["Git and Markdown"]
        PG["PostgreSQL and pgvector"]
    end

    subgraph Learn["Learning"]
        Hindsight["Hindsight"]
        Reflect["Reflection"]
        Improve["Improvement"]
    end

    Sources --> RC
    RC --> Librarian
    Librarian --> Store
    Store --> Core
    Core --> AgentOutput
    AgentOutput --> Learn
    Learn --> Librarian
    DreamOut --> RC
    DreamOut --> Learn

4. The Librarian (Core System)

Role

The Librarian is the central intelligence curator .

Responsibilities

  • ingesting all documents
  • structuring and classifying knowledge
  • maintaining consistency and naming discipline
  • indexing and retrieval
  • rendering documents for human and agent use
  • controlling knowledge lifecycle

The Librarian is not storage.

It is:

the operating system of knowledge


4.1 Librarian Decision Logic

4.1.1 Incoming Document (Ingestion Pipeline)

flowchart TD
    NewDoc --> CleanDocument
    CleanDocument --> CheckExisting
    CheckExisting -->|match| UpdateDoc
    CheckExisting -->|no match| NewDocCreate
    UpdateDoc --> Reindex
    NewDocCreate --> Reindex
    Reindex --> HandleMetadata
    HandleMetadata --> FindCorrectLocation
    FindCorrectLocation --> SaveDocument
    SaveDocument --> Available

Explanation

CleanDocument
  • normalize formatting
  • remove noise
  • ensure YAML compliance
CheckExisting
  • detect:
  • duplicates
  • version updates
  • partial overlaps
UpdateDoc vs NewDocCreate
  • Update → version increment
  • New → new identity assigned
Reindex
  • update search index
  • update topic linkage
HandleMetadata
  • enforce YAML structure
  • enrich metadata:
  • topics
  • tags
  • ownership
  • version
FindCorrectLocation
  • determine:
  • folder path
  • naming prefix (RULE_, BLUEPRINT_, etc.)
  • domain placement
SaveDocument
  • commit to:
  • Git (source)
  • DB (index + metadata)

4.1.2 Search Document

flowchart TD
    SearchOptions --> ByMetadataPrompt
    SearchOptions --> ByContentPrompt
    SearchOptions --> ByContextPrompt
    ByMetadataPrompt --> FindDocument
    ByContentPrompt --> FindDocument
    ByContextPrompt --> FindDocument

Search Modes

By Metadata
  • structured queries
  • fast
  • low token cost
By Content
  • semantic / text-based
  • deeper but more expensive
By Context
  • uses:
  • current task
  • current topic
  • agent context

4.1.3 Display or OutGoing Document

flowchart TD
    RequestDocument --> AddHeaderFooter
    AddHeaderFooter --> AddTOC
    AddTOC --> AddIndex
    AddIndex --> AddRelevantContext
    AddRelevantContext --> ChooseFormat
    ChooseFormat --> Markdown
    ChooseFormat --> PDF
    ChooseFormat --> HTML
    ChooseFormat --> GoogleDocs
    ChooseFormat --> OfficeDocs
    Markdown --> ChooseAction
    PDF --> ChooseAction
    HTML --> ChooseAction
    GoogleDocs --> ChooseAction
    OfficeDocs --> ChooseAction
    ChooseAction --> DisplayDocument
    ChooseAction --> DisplayDocument
    DisplayDocument --> |MarkdownOnly|EditDocument-Editor
    DisplayDocument --> |MarkdownOnly|EditDocument-Prompt
    EditDocument-Editor --> AddToLibrarian
    EditDocument-Prompt --> AddToLibrarian
    ChooseAction --> DownloadDocument
    ChooseAction --> EmailDocument

Rendering Logic

AddHeaderFooter
  • branding
  • copyright
  • metadata summary
AddTOC
  • dynamic table of contents
AddIndex
  • references
  • links to related knowledge
AddRelevantContext
  • inject:
  • related documents
  • linked topics
  • dependencies

Formats

  • Markdown → source of truth
  • HTML → UI rendering
  • PDF → distribution
  • Google Docs → collaboration
  • Office Docs → enterprise usage

Actions

  • Display
  • Edit (Markdown only)
  • Download
  • Email
  • Re-ingest into Librarian

4.2 Librarian Output Types

  • Markdown (source of truth)
  • HTML (human view)
  • Indexed entries (for retrieval)
  • Metadata (YAML)

4.3 Search Document

  • By Metadata - Using Prompt
  • By Text - Using Prompt
  • By Context - Using Prompt

4.4 Librarian as System Boundary

The Librarian sits between:

  • Knowledge creation
  • Knowledge storage
  • Knowledge retrieval
  • Knowledge presentation

It ensures:

  • no duplication
  • no orphan data
  • consistent structure
  • continuous evolution

4.5 Librarian Document Decomposition Protocol

The Librarian's core job is breaking large documents into linked atomic knowledge notes.

decomposition_protocol:
  input: large document (blueprint, research report, design doc)
  output: linked atomic notes in knowledge vault

  extraction_targets:
    concepts: "Named system capabilities (Control Bus, ODM, Optimizer)"
    entities: "ODM objects (User, Idea, Ticket, Host, Agent)"
    roles: "System roles (orchestrator, governor, specialist)"
    technologies: "Tools and platforms (PostgreSQL, Hindsight, Ruflo)"
    decisions: "Architecture decisions with rationale"
    processes: "Workflows and procedures (backup, spawning, skill selection)"

  note_format:
    - YAML frontmatter (type, source_document, related, tags)
    - Wikilinks [[concept_name]] to related notes
    - One concept per note (atomic)
    - Source reference back to original document section

  structure:
    concepts/: system capability notes
    entities/: ODM entity notes
    roles/: role definition notes
    technologies/: technology notes (already populated)
    decisions/: architecture decision records
    processes/: workflow descriptions

Decomposition Flow

When the Librarian receives a large document:

  1. Scan -- identify extraction targets (concepts, entities, roles, technologies, decisions, processes)
  2. Extract -- create one atomic note per target, with YAML frontmatter and wikilinks
  3. Link -- ensure all notes cross-reference related notes using [[concept_name]] wikilinks
  4. Place -- file each note in the correct vault subdirectory based on its type
  5. Index -- update the knowledge ledger with creation events for each new note
  6. Verify -- check for duplicate notes, missing links, and orphaned references

Rules

  • One concept per note. If a note covers two distinct concepts, split it.
  • Every note must link back to its source document section.
  • Every note must have YAML frontmatter with at least: type, source_document, related, tags.
  • The Librarian must check for existing notes before creating new ones (search-before-create).
  • Decomposition of protected documents (blueprints, rules) requires the same governance as any knowledge mutation.

4.5A Librarian Memory Engineering Requirements

The Librarian must comply with the Memory Engineering Principles defined in Part 4, Section 4.8A. Specifically:

  • Principle 1 (Async Updates): Librarian ingestion, indexing, and storage must not block agent execution. When an agent produces a knowledge artifact, the Librarian processes it asynchronously.
  • Principle 2 (Debounce Writes): When multiple documents arrive in rapid succession (e.g., during a research burst), the Librarian should batch ingestion rather than processing each document individually.
  • Principle 3 (Confidence Threshold): Knowledge Objects ingested by the Librarian should carry a confidence score. The Librarian enforces the 0.7 threshold and per-type caps to prevent unbounded knowledge growth.
  • Principle 5 (Atomic Writes): All Librarian writes to state files, ledger entries, and index updates must use the atomic write pattern (write to temp file, then rename).

4.6 Future Extensions

  • auto-refactoring documents
  • cross-document synthesis
  • contradiction detection
  • knowledge graph visualization
  • automated documentation improvement suggestions

4.7 Knowledge Ledger (Change & Evolution Log)

Purpose

The Knowledge Ledger (KL) is a system-wide immutable log of all knowledge transformations.

It tracks:

  • document creation
  • document updates
  • metadata changes
  • reclassification
  • movement across locations
  • document revival
  • document export / distribution
  • deletion (logical, never physical)

This ensures:

  • traceability
  • auditability
  • explainability
  • reproducibility

4.7.1 Why This Is Required

Without a ledger:

  • metadata becomes unreliable
  • knowledge evolution is invisible
  • agents cannot learn properly
  • debugging becomes impossible
  • compliance (future) is broken

With a ledger:

XIOPro becomes self-explainable over time


4.7.2 Ledger Structure

ledger_entry:
  id: string
  timestamp: datetime
  document_id: string

  action: enum
    - created
    - updated
    - reclassified
    - moved
    - indexed
    - revived
    - exported
    - deleted_logical

  actor:
    type: enum (agent, human, system)
    id: string

  change_summary: string

  metadata_before: object|null
  metadata_after: object|null

  content_hash_before: string|null
  content_hash_after: string|null

  related_entities:
    - topic_id
    - ticket_id
    - task_id

  notes: string|null

4.7.3 Ledger Flow

flowchart TD
    DocumentChange --> CaptureEvent
    CaptureEvent --> CreateLedgerEntry
    CreateLedgerEntry --> StoreLedger
    StoreLedger --> IndexLedger
    IndexLedger --> AvailableForQuery

4.7.4 Event Sources

Ledger entries are generated from:

  • Librarian ingestion pipeline
  • document edits (UI / RC / agents)
  • metadata updates
  • topic reassignment
  • exports (PDF, HTML, Docs)
  • re-ingestion after edits
  • Dream Engine updates

4.7.5 Document Revival Tracking

When a document is:

  • re-used after long inactivity
  • referenced by a new ticket
  • pulled into a new context

→ it is marked as:

action: revived

This enables:

  • tracking knowledge reuse
  • identifying high-value documents
  • prioritizing maintenance

4.7.6 Export Tracking

When a document is:

  • downloaded
  • emailed
  • exported to external format

→ ledger records:

action: exported
target: enum (pdf, html, google_docs, office_docs

4.7.7 Ledger Usage

By Agents
  • detect frequently updated docs
  • identify unstable knowledge
  • suggest improvements
By the Governor
  • detect:
  • excessive changes
  • instability
  • redundant updates
By UI
  • show:
  • document history
  • evolution timeline
  • change summaries

4.7.8 Metadata vs Data (Your Insight)

This is key:

Metadata is not enough — it must become data-backed history

The ledger ensures:

  • metadata is traceable
  • changes are reconstructable
  • state is explainable

4.7.9 Storage Strategy

DB (Primary)
  • ledger entries
  • indexed for queries
Optional
  • append-only log system (future)
  • event streaming (future)

4.7.10 Anti-Patterns Prevented

  • silent document overwrites
  • metadata drift
  • lost evolution history
  • untraceable changes
  • "why did this change?" ambiguity

4.7.11 Success Criteria

The Knowledge Ledger is successful if:

  • every document change is traceable
  • history can be reconstructed
  • agents can learn from change patterns
  • system behavior is explainable

4.7.12 Final Statement

The Knowledge Ledger transforms XIOPro from:

a system that stores knowledge

into:

a system that understands how its knowledge evolves


5. Topics System (Core Spine)

Purpose

Topics are the universal classification system .

Properties

  • hierarchical (tree)
  • relational (graph)
  • indexed
  • extensible

5.1 Topic Structure

flowchart TD
    Root --> ProductDomain
    Root --> ComplianceStandards
    ProductDomain --> Modules
    ComplianceStandards --> Roles
    ComplianceStandards --> Validation

5.2 Topic Functions

Topics drive:

  • agent responsibility
  • knowledge classification
  • search
  • UI navigation
  • project alignment

6. Knowledge Objects

Types

  • RULE
  • SKILL
  • BLUEPRINT
  • STATE
  • REFLECTION
  • PROFILE
  • LOG

6.1 Knowledge Schema

id: string
type: enum
topics: [topic_id]
content_ref: string
version: string
confidence: float          # 0.0-1.0 — see Part 4, Section 4.8A (Memory Engineering Principle 3)
created_at: datetime
updated_at: datetime
metadata: object

content_ref Resolution

content_ref is a file path relative to the knowledge vault root (~/STRUXIO_Workspace/struxio-knowledge/vault/).

Format: vault/<category>/<filename>.md

Example: vault/blueprints/BLUEPRINT_XIOPro_v5_Part5_Knowledge_System.md

Agents and the Librarian resolve content_ref by joining the vault root with the stored path. The vault root is never stored in the content_ref value itself — only the relative path is stored.


Confidence Score Rules (Memory Engineering Principle 3)

Every Knowledge Object carries a confidence score (0.0-1.0) per Part 4, Section 4.8A:

  • Facts extracted automatically by Hindsight or agents: confidence assigned by the extraction model
  • Facts entered by the user: confidence defaults to 1.0
  • Facts below 0.7 confidence are discarded at ingestion time
  • Per-type caps prevent unbounded growth (e.g., max 100 active facts per agent context)
  • When cap is reached, lowest-confidence items are trimmed first

7. Rules, Skills & Activation Stewardship

7.1 Asset Families

XIOPro must treat the following as first-class knowledge assets:

  • RULE_*
  • SKILL_*
  • ACTIVATION_* or agent activation files such as claude.md
  • PATTERN_*
  • PROTOCOL_*

These assets are not casual notes. They are behavior-shaping system assets.


7.2 Rules vs Skills vs Activations

RULE_

Defines:

  • constraints
  • obligations
  • boundaries
  • approval requirements
  • forbidden or required behavior

SKILL_

Defines:

  • reusable execution capability
  • procedure
  • method
  • template for recurring work

ACTIVATION_

Defines:

  • agent-specific working mode
  • operating preferences
  • persistent instructions
  • behavior shaping for a named runtime or profession

Example:

  • claude.md for a specific agent runtime

PATTERN_

Defines:

  • reusable operating structures
  • standard workflows
  • repeatable multi-step methods

7.3 Why Stewardship Is Required

As XIOPro evolves, the system will continuously need:

  • new skills
  • revised rules
  • activation tuning
  • consolidation of duplicates
  • retirement of obsolete assets

Without stewardship, the rule/skill layer becomes:

  • conflicting
  • repetitive
  • hard to search
  • unsafe to modify
  • costly to maintain

7.4 Rule Steward Role

See Part 4, Section 4.2A for the full Rule Steward Role specification (responsibilities, non-responsibilities, managed asset classes, operating modes, stewardship flow, and relation to other components).

Within the Knowledge System, the rule steward is the role responsible for the lifecycle quality of the behavior-shaping assets defined in Section 7.1-7.3 above.


7.5 Technology Model

For T1P, the rule/skill system should use two synchronized forms.

Human-Readable Source of Truth

Stored in Git as Markdown assets with structured metadata.

Recommended minimum front matter:

id: string
asset_type: enum
# RULE | SKILL | ACTIVATION | PATTERN | PROTOCOL

name: string
owner: string|null
status: enum
# draft | review | approved | active | deprecated | archived

scope: [string]
applies_to: [string]
precedence: int|null
approval_required: boolean
version: string
supersedes: [string]
conflicts_with: [string]
created_at: datetime
updated_at: datetime

Structured Runtime Mirror

Normalized YAML/DB representation used for:

  • validation
  • querying
  • conflict detection
  • policy evaluation
  • lineage tracking
  • approval workflow

The Markdown asset remains the human-readable canonical source. The structured mirror makes it machine-usable.


7.6 Validation Pipeline

Every new or changed rule/skill/activation should pass through:

  1. existence search
  2. schema validation
  3. metadata validation
  4. conflict and overlap detection
  5. effectiveness review where signals exist
  6. approval determination
  7. publication and indexing
flowchart TD
    Proposal --> SearchExisting
    SearchExisting --> ValidateSchema
    ValidateSchema --> ValidateMetadata
    ValidateMetadata --> DetectConflicts
    DetectConflicts --> EvaluateEffectiveness
    EvaluateEffectiveness --> ApprovalDecision
    ApprovalDecision --> Publish
    Publish --> Reindex

7.7 Usage Flow

flowchart TD
    Task --> Agent
    Agent --> SearchRelevantAssets
    SearchRelevantAssets --> RuleSelection
    SearchRelevantAssets --> SkillSelection
    SearchRelevantAssets --> ActivationBinding
    RuleSelection --> Execution
    SkillSelection --> Execution
    ActivationBinding --> Execution
    Execution --> Result
    Result --> Reflection
    Reflection --> RuleStewardReview

7.8 Skill Discovery & Creation Loop

XIOPro must support the creation of new skills over time.

The rule is:

  • search existing assets first
  • reuse if sufficient
  • extend if close fit exists
  • create new only when gap is real

New skills may be drafted by Claude or other approved agent surfaces, but they must still pass through rule steward stewardship and approval policy.

Typical triggers

  • repeated manual workaround
  • recurring task pattern
  • repeated failure due to missing procedure
  • founder request
  • Dream Engine recommendation
  • postmortem / hindsight finding

7.9 Activation Governance

Activation assets such as claude.md are governed artifacts.

They must support:

  • owner
  • scope
  • version
  • approval requirement
  • compatibility notes
  • performance review history
  • deprecation path

An activation file must never be treated as an unmanaged side document.


7.10 Conflict & Supersession Rules

Each governed asset should be able to declare:

  • supersedes
  • conflicts_with
  • replaced_by
  • derived_from

If conflict exists and cannot be resolved automatically, The rule steward must open a governed review or approval flow.


7.11 Lifecycle

Recommended lifecycle:

asset_lifecycle:
  - draft
  - review
  - approved
  - active
  - deprecated
  - archived

Notes:

  • draft = proposed, not trusted
  • review = under evaluation
  • approved = accepted for use
  • active = currently in force / use
  • deprecated = retained but should no longer be selected by default
  • archived = retained history only

7.12 Search & Retrieval Requirement

The knowledge system must support finding rules/skills/activations by:

  • exact ID
  • name
  • topic
  • owner
  • asset type
  • scope
  • related task type
  • conflict/supersession relation
  • status

This is required so the system can reuse before it rewrites.


7.13 Module Portfolio Knowledge Layer

The knowledge system must also govern the module portfolio layer.

This includes knowledge about:

  • commercial modules
  • subscription-backed access paths
  • API-backed access paths
  • local/self-hosted modules
  • hosting environments
  • evaluation history
  • recommendation history
  • deprecation and replacement lineage

These are not runtime-only facts.

They are durable intelligence assets that must be queryable and reviewable.


7.14 Module Asset Classes

Recommended governed asset classes:

module_asset_classes:
  - MODULE
  - MODULE_POLICY
  - SUBSCRIPTION
  - HOSTING_PROFILE
  - MODULE_EVALUATION
  - MODULE_RECOMMENDATION

Meaning

MODULE

Defines a specific module or model option.

Example properties:

id: string
provider: string
module_name: string
access_modes: [string]
# subscription | api_key | local | hosted_self_managed

status: enum
# candidate | approved | active | constrained | deprecated | archived

quality_notes: [string]
latency_tier: string|null
cost_tier: string|null
privacy_posture: string|null
fallback_modules: [string]
MODULE_POLICY

Defines where and how a module may be used.

Example properties:

id: string
module_id: string
allowed_task_classes: [string]
forbidden_task_classes: [string]
allowed_surfaces: [string]
approval_required_for_use: boolean
notes: [string]
SUBSCRIPTION

Defines a commercial access plan or account-bound capability.

Example properties:

id: string
provider: string
plan_name: string
scope: string|null
capabilities: [string]
limitations: [string]
quota_notes: [string]
status: enum
HOSTING_PROFILE

Defines an environment profile for local/server/self-hosted viability.

Example properties:

id: string
environment_type: enum
# mac_local | linux_server | cloud_gpu | cloud_cpu | hybrid

compute_notes: [string]
memory_notes: [string]
storage_notes: [string]
network_notes: [string]
security_notes: [string]
compatibility_notes: [string]
MODULE_EVALUATION

Stores structured evaluation history for a module candidate or active option.

Typical fields:

  • task class tested
  • quality observations
  • latency observations
  • cost observations
  • stability observations
  • trust / reliability notes
  • hosting observations
  • recommendation outcome
MODULE_RECOMMENDATION

Stores proposed portfolio actions such as:

  • adopt
  • prefer
  • constrain
  • retire
  • self-host
  • compare further
  • reject for now

7.15 Module Portfolio Search Requirement

The knowledge system must support finding module assets by:

  • provider
  • module name
  • access mode
  • subscription availability
  • hosting profile
  • task fit
  • latency tier
  • cost tier
  • status
  • fallback relation
  • recommendation status
  • replacement / supersession relation

This is required so the module steward can optimize through searchable evidence, not memory fragments.


7.16 Module Evaluation & Recommendation Loop

Recommended loop:

flowchart TD
    NeedDetected --> SearchPortfolio
    SearchPortfolio --> ExistingFit
    ExistingFit -->|sufficient| RecommendUse
    ExistingFit -->|insufficient| CompareCandidates
    CompareCandidates --> EvaluateQuality
    CompareCandidates --> EvaluateCost
    CompareCandidates --> EvaluateStability
    CompareCandidates --> EvaluateHostingFit
    EvaluateQuality --> Recommendation
    EvaluateCost --> Recommendation
    EvaluateStability --> Recommendation
    EvaluateHostingFit --> Recommendation
    Recommendation --> ApprovalDecision
    ApprovalDecision --> PortfolioUpdate

Rule

The system must prefer:

  • reuse
  • comparison
  • constrained recommendation

before creating unnecessary new module dependence.


7.17 Optimization Record Requirement

Every meaningful module recommendation should preserve the optimization rationale.

It should be possible to answer later:

  • why this module was preferred
  • what tradeoffs were accepted
  • what resource constraints mattered
  • what fallback was defined
  • what hosting assumptions were required
  • why a subscription or self-hosting proposal was or was not approved

This is necessary for:

  • trust
  • auditability
  • future re-evaluation
  • Dream / hindsight learning
  • portfolio optimization over time

7.18 T1P RAG Pipeline (v5.0.8 Addition)

XIOPro uses retrieval-augmented generation to supply agents with relevant knowledge context. This section specifies the T1P RAG pipeline design.

rag_pipeline:
  embedding_model: "text-embedding-3-small (OpenAI) or BGE-M3 (self-hosted)"
  vector_store: "pgvector (PostgreSQL extension, v0.8.2, already installed)"
  chunking_strategy:
    method: "recursive character splitting"
    chunk_size: 1000
    chunk_overlap: 200
    metadata_preserved: [source_file, section, topic_id, document_type]
  retrieval:
    method: "hybrid (vector similarity + full-text BM25)"
    top_k: 10
    reranking: "optional  FlashRank or Cohere rerank if quality insufficient"
    quality_metric: "relevance score threshold > 0.7"
  generation:
    context_injection: "retrieved chunks injected as system context"
    citation_required: true
    hallucination_guard: "verify claims against retrieved chunks"

Design Rationale

  • pgvector handles vector search within PostgreSQL. No separate vector DB needed for T1P. This aligns with the single-database posture (Part 2, Section 5.5) and avoids operational overhead.
  • Hybrid retrieval (vector similarity + BM25 full-text) covers both semantic and keyword-exact matches. PostgreSQL's built-in tsvector provides BM25-equivalent ranking alongside pgvector cosine similarity.
  • Chunking uses recursive character splitting to respect section boundaries in Markdown knowledge assets. Metadata preservation ensures retrieved chunks can be traced back to source documents and topics.
  • Embedding model choice is deferred until Research Center actively needs it. For T1P, Hindsight handles agent memory retrieval using its own embedding pipeline. When the Librarian requires semantic search across governed knowledge, the embedding model will be selected based on cost/quality evaluation by the Module Steward.
  • Reranking is optional at T1P. If retrieval quality (measured by relevance score threshold) is insufficient with hybrid search alone, FlashRank (self-hosted, zero cost) or Cohere rerank (API, low cost) can be added as a post-retrieval filter.
  • Context window budget: retrieved chunks are injected as system context. The Prompt Steward (Part 4, Section 4.2B) manages total context budget, ensuring RAG chunks do not crowd out task instructions or conversation history.

T1P Scope

For T1P, the RAG pipeline operates on:

  • governed knowledge assets (rules, skills, activation files)
  • Librarian-indexed documents
  • project-scoped knowledge (per project_id)

Full semantic search across Research Center outputs, Obsidian vault, and external sources is deferred to post-T1P.


8. Research Center

8.1 Purpose

XIOPro needs a unified Research Center, not a collection of disconnected research-related tools.

The Research Center is the governed layer that coordinates:

  • the Librarian
  • NotebookLM
  • Obsidian
  • scheduled research tasks
  • external research sources
  • curated research outputs
  • founder-facing research workflows

Its purpose is to transform research from ad hoc querying into a repeatable system capability.


8.2 Role in XIOPro

The Research Center is responsible for:

  • collecting and curating research inputs
  • running scheduled or triggered research workflows
  • producing usable research outputs
  • preserving research lineage
  • reducing repeated research effort
  • turning external findings into indexed internal knowledge
  • supporting founder exploration without losing system structure

The Research Center is not the same as the Librarian.

The Librarian is the knowledge operating system.

The Research Center is the research workflow and synthesis layer built on top of that knowledge foundation.


8.2A Research Domains (v5.0.5 Clarification)

The Research Center serves ALL knowledge domains, not just XIOPro technology. Any domain that informs founder decisions, system design, or market positioning is in scope.

research_domains:
  devxio_technology:
    description: "Tools, skills, frameworks, MCP servers for XIOPro itself"
    sources: [awesome-lists, GitHub, npm, PyPI, HuggingFace]
    scan_frequency: weekly-monthly

  product_domain:
    description: "Product-specific domain knowledge (see MVP1_PRODUCT_SPEC.md for first product)"
    sub_domains:
      - industry_standards: "Relevant compliance standards and updates"
      - market_players: "Competitors, consultancies, tech vendors"
      - domain_tech: "Competing platforms, new tools, market trends"
      - regulatory: "Regulatory changes, compliance updates"
    sources: [industry_publications, standards_bodies, competitor_sites]
    scan_frequency: monthly-quarterly

  ai_and_llm_landscape:
    description: "LLM providers, model releases, pricing, capabilities"
    sub_domains:
      - model_releases: "New Claude, GPT, Gemini, open-weight models"
      - agent_frameworks: "CrewAI, pydantic-ai, LangGraph, etc."
      - pricing_changes: "API costs, subscription changes"
    sources: [provider_blogs, HuggingFace, benchmarks]
    scan_frequency: weekly

  market_and_business:
    description: "Customer research, competitors, pricing, go-to-market"
    sub_domains:
      - competitors: "Procore, PlanRadar, Dalux, BIM 360 etc."
      - customers: "Target personas, industry trends"
      - pricing: "Market rates, willingness to pay"
    sources: [industry_reports, competitor_sites, LinkedIn, conferences]
    scan_frequency: quarterly

NotebookLM as Research Acceleration Surface

NotebookLM is a critical Research Center integration -- not just for document synthesis, but for:

  • Smart prompting of research questions
  • Deep research across curated source bundles
  • Voice overview generation for founder consumption
  • Multi-source synthesis and comparison
  • Presentation-ready research outputs

When deployed, NotebookLM becomes the primary research acceleration surface for complex, multi-source research tasks across ALL domains -- not just technology.


8.3 Core Principle

Research should move through a governed flow:

source discovery → collection → curation → synthesis → storage → retrieval → reuse

The system must distinguish between:

  • raw source material
  • curated research bundles
  • generated summaries or overviews
  • approved internal knowledge

8.4 Core Components

The Research Center coordinates at least these components:

  • Librarian
  • NotebookLM
  • Obsidian
  • external research connectors
  • scheduled research jobs
  • research task definitions
  • research output store
  • research review / approval flow
flowchart TD
    Sources[Research Sources] --> Intake[Research Intake]
    Intake --> Curate[Curate / Normalize]
    Curate --> Librarian[Librarian]
    Librarian --> ResearchTasks[Research Tasks / Schedules]
    ResearchTasks --> NotebookLM[NotebookLM]
    ResearchTasks --> Obsidian[Obsidian]
    ResearchTasks --> Synthesis[Synthesis / Comparison / Reports]
    Synthesis --> Outputs[Research Outputs]
    Outputs --> Librarian
    Outputs --> Founder[Founder / Control Center]
    Outputs --> KnowledgeUse[Future Retrieval / Reuse]

8.5 Research Input Classes

The Research Center should handle multiple input classes:

Internal Knowledge Inputs

  • blueprint parts
  • rules
  • skills
  • activations
  • historical decisions
  • evaluations
  • prior research outputs

Connected / Curated Source Inputs

  • uploaded files
  • curated documents
  • internal notes
  • founder research packets
  • approved web captures or exports

External Research Inputs

  • web research outputs
  • module/provider references
  • Hugging Face model and repo research
  • benchmark/evaluation notes
  • self-hosted model comparison material
  • research exports from approved tools

External inputs must be curated before they become trusted internal knowledge.


8.6 Librarian Integration

The Librarian remains the authority for:

  • ingestion discipline
  • naming discipline
  • indexing
  • topic assignment
  • storage routing
  • retrieval support
  • lifecycle control

The Research Center depends on the Librarian for structured persistence.

Rule:

Research outputs are not complete until they are either:

  • ingested by the Librarian, or
  • explicitly marked as transient / draft

8.7 NotebookLM Integration

Role

NotebookLM is used as a research presentation and synthesis surface for tasks such as:

  • voice overview
  • video overview
  • summaries
  • presentation generation
  • thematic synthesis across curated source packets

Boundary

NotebookLM is not the source of truth.

It is a research acceleration and output layer.

Allowed Pattern

  • XIOPro selects or prepares curated research bundles
  • NotebookLM generates overviews, synthesis, or presentation outputs
  • Librarian stores approved outputs and references

Rule

NotebookLM outputs must preserve lineage to the source bundle used.

Current Status (v5.0 Addition)

NotebookLM is not deployed. It remains in the architecture as a planned integration surface. T1P does not depend on NotebookLM availability. When deployed, it should follow the integration pattern described above.


8.8 Obsidian Integration

Role

Obsidian is the living knowledge companion — a linked, navigable, human-friendly surface that makes XIOPro's knowledge visible, explorable, and enrichable.

It is not just a vault. It is an active part of the knowledge workflow.

What Lives in Obsidian

Obsidian should contain a linked, navigable mirror of:

Architecture and Design
  • XIOPro blueprint (all 9 parts, linked by cross-references)
  • Architecture decisions and their rationale
  • System capability map
  • ODM entity relationships
Technology Evaluations
  • Every open-source tool evaluated (positive AND negative outcomes)
  • Example: Phylum (acquired by Veracode, free tier discontinued — decision: use Socket.dev instead)
  • Example: Ruflo vs Bus analysis leading to Control Bus architecture
  • Evaluation template: what it is, why we considered it, outcome, date
Work History
  • Sprint retrospectives
  • Key decisions and why they were made
  • Incidents and root causes
  • Pattern: what worked, what didn't
Research
  • Research task outputs (linked to tickets)
  • Curated external references
  • Competitor analysis
  • Domain knowledge (product-specific -- see MVP1_PRODUCT_SPEC.md for first product)
Agent Knowledge
  • Agent activation files (linked)
  • Rules and skills registry (linked)
  • Lessons learned per agent

Obsidian Vault Structure

STRUXIO_Obsidian_Vault/
  INDEX.md                          # Master index with links to all sections
  architecture/
    XIOPro_Blueprint_Overview.md    # Links to all 9 parts
    Architecture_Decisions.md       # ADR-style linked notes
    Control_Bus.md                  # Design rationale, evolution
    ODM_Entity_Map.md              # Entity relationships
  technology/
    _Technology_Index.md            # All evaluations
    Phylum.md                       # Evaluated → rejected (Veracode acquisition)
    Socket_dev.md                   # Evaluated → adopted (supply chain security)
    Ruflo.md                        # In use — agent runtime
    LiteLLM.md                      # In use — model router
    Hindsight.md                    # In use — memory system
    Neo4j.md                        # Deprecated — removed (see Section 12.1)
    Jujutsu_jj.md                   # Evaluated → deferred
    [every tool we evaluate gets a note]
  work/
    Sprint_S001.md                  # Retrospective
    Incidents/                      # Root cause notes, linked to fixes
    Decisions/                      # Key decisions with rationale
  research/
    product_domain/                 # Domain knowledge (see MVP1_PRODUCT_SPEC.md)
    competitors/
  agents/
    000_BrainMaster.md              # Agent profile, lessons, patterns
    001_Compliance.md
    002_Engineering.md
    ...

Sync Model

Obsidian vault lives on Mac Studio. Sync with XIOPro via:

  1. Git-based sync — vault is a Git repo or symlinked to design repo sections
  2. Agent-to-Obsidian — when the BrainMaster or a domain brain produces a decision, evaluation, or lesson, the Librarian (or a scheduled job) creates/updates the corresponding Obsidian note
  3. Obsidian-to-XIOPro — founder creates notes in Obsidian during thinking/research. Notes marked #promote are ingested by the Librarian into governed knowledge.
  4. Blueprint sync — when BP parts are updated, corresponding Obsidian architecture notes are updated automatically

Technology Evaluation Template

Every tool/library/service we evaluate gets an Obsidian note:

---
name: [Tool Name]
type: technology_evaluation
status: adopted | rejected | deferred | under_evaluation
date_evaluated: YYYY-MM-DD
evaluated_by: [agent or founder]
---

## What It Is
[One paragraph description]

## Why We Considered It
[The problem it would solve for XIOPro]

## Evaluation
[Findings — capabilities, limitations, pricing, maturity]

## Decision
[Adopted / Rejected / Deferred — with clear reason]

## Links
- [Official site]
- [GitHub]
- [Related XIOPro ticket if any]

Boundary

Obsidian may mirror and enrich knowledge, but it must not silently become a second uncontrolled source of truth.

Rules: - Git repos remain source of truth for code, blueprints, and state - PostgreSQL remains source of truth for operational data (ODM) - Obsidian is a navigation and enrichment layer, not an authoritative store - Notes promoted from Obsidian to governed knowledge go through Librarian discipline

Current Status (v5.0 Addition)

Obsidian is now being set up on Mac Studio (Mac Worker task, 2026-03-28). Vault location: ~/STRUXIO_Workspace/STRUXIO_Obsidian_Vault/ Initial seed: domain wiki + blueprint parts. Next: populate technology evaluation notes, link architecture decisions.

STRUXIO_Knowledge Repository (v5.0 Addition)

A dedicated Git repository for governed knowledge assets that syncs with the Obsidian vault.

repo: STRUXIO-ai/struxio-knowledge
purpose: Central knowledge store — syncs with Obsidian vault, feeds Librarian
contains:
  - architecture decisions (ADRs)
  - technology evaluations (adopted, rejected, deferred)
  - domain knowledge (product-specific -- see MVP1_PRODUCT_SPEC.md)
  - research outputs
  - agent lessons and patterns
  - sprint retrospectives
  - incident postmortems

sync_model:
  obsidian_to_git: "Founder edits in Obsidian  committed to struxio-knowledge"
  git_to_obsidian: "Agent-produced knowledge  appears in Obsidian vault"
  librarian_ingest: "Promoted notes  governed knowledge via Librarian"
Relationship to Other Repos
Repo Contains Knowledge Role
struxio-knowledge Governed knowledge, evaluations, decisions, research Knowledge source of truth
struxio-design Architecture blueprints, product design, UX specs Design documents (may promote to knowledge)
struxio-logic Agent activations, rules, skills Behavioral assets (rule steward governs)
STRUXIO_OS State, tickets, engineering, infra Operational state
struxio-app Product code Codebase
struxio-business Business documents Business context
Rule

struxio-knowledge is the canonical home for knowledge that outlives a single sprint, ticket, or conversation. If something is worth keeping, it belongs here — not buried in a design doc or chat transcript.


8.9 Skill Registry and Governance (v5.0 Addition)

Cross-reference: Skill selection logic (which skills to load for a given task assignment) is defined in Part 4, Section 4.11 — Skill Selection Architecture. This section defines the registry; Section 4.11 defines the selection filter.

Problem

Skills are currently: - Defined as individual SKILL.md files in ~/.claude/skills/ - Referenced by name in every agent activation file (ACTIVATE_BM.md, ACTIVATE_B1.md, etc.) - Not centrally indexed or versioned - Difficult to maintain — changing a skill name means editing every activation file

This violates the "single source of truth" principle. The rule steward should govern skills centrally.

Solution: Central Skill Registry

A single SKILL_REGISTRY.yaml file that: - Lists all active skills with metadata - Maps skills to agents (which agents use which skills) - Tracks versions and status - Lives in struxio-logic/skills/ (source of truth) - Activation files reference the registry, not individual skills

# struxio-logic/skills/SKILL_REGISTRY.yaml
skills:
  - id: paperclip-sync
    name: "Paperclip Sync"
    path: ~/.claude/skills/paperclip-sync/SKILL.md
    version: "1.0.0"
    status: active
    used_by: [000, 001, 002, 003, 004, 005, 010]
    triggers: [/paperclip, /sync, /ticket]
    description: "Sync ticket status with Paperclip issue tracker"

  - id: writing-plans
    name: "Writing Plans"
    path: ~/.claude/skills/writing-plans/SKILL.md
    version: "1.0.0"
    status: active
    used_by: [000, 001, 002, 003, 004, 005]
    triggers: [/write-plan]
    description: "Write structured implementation plans"

  - id: systematic-debugging
    name: "Systematic Debugging"
    path: ~/.claude/skills/systematic-debugging/SKILL.md
    version: "1.0.0"
    status: active
    category: engineering        # See Section 4.11 for categories
    min_model_tier: sonnet       # Minimum model tier (haiku/sonnet/opus)
    used_by: [000, 002, 005]
    triggers: [/debug]
    description: "Debug bugs systematically before proposing fixes"

  # ... all other active skills

# Skill categories (aligned with Section 4.11 Skill Selection Architecture):
#   execution:      any model — paperclip-sync, verification-before-completion, etc.
#   engineering:    sonnet+   — TDD, systematic-debugging, code-review, etc.
#   architecture:   sonnet/opus — brainstorming, writing-plans, executing-plans, etc.
#   infrastructure: any model — hooks-automation, swarm-orchestration, etc.
#   knowledge:      sonnet+   — writing-skills, skill-builder, reasoningbank-agentdb, etc.
#   domain:         varies    — sparc-methodology, github-*, flow-nexus-*, claude-api

Activation File Pattern

Instead of listing skills inline, activation files reference the registry:

## Skills
Load skills from SKILL_REGISTRY.yaml for this agent's role.
Registry: struxio-logic/skills/SKILL_REGISTRY.yaml

Rule Steward Responsibilities for Skills

The rule steward must: - Maintain the registry as source of truth - Search for existing skills before creating new ones - Validate skill metadata and structure - Detect duplicate or conflicting skills - Propose skill consolidation or deprecation - Update registry when skills are added/changed/removed - Ensure all agents reference registry, not hardcoded skill names

Skill in Obsidian

Each skill should have a corresponding Obsidian note with: - What it does - Which agents use it - When it was last updated - Link to the SKILL.md source

T1P Priority

This is "Active but Narrow" for T1P: - Create the registry file - Migrate existing skills into it - Update activation files to reference registry - Full rule steward automation deferred


8.9A Skill Performance Database (v5.0.5 Addition)

Skills are not equal. They differ in: - Token consumption (some skills use 2x the tokens for the same result) - Result quality (measured by task completion rate, rework rate) - Model compatibility (some skills work poorly on Haiku) - Execution time

The system must track skill performance to: - Detect when a new external skill outperforms an existing one - Optimize token spend by preferring efficient skills - Retire underperforming skills - Compare internal skills against community alternatives

skill_performance_record:
  skill_id: string

  # Usage metrics (rolling)
  total_invocations: int
  avg_tokens_per_invocation: float
  avg_execution_time_ms: float

  # Quality metrics
  task_completion_rate: float       # % of tasks completed when this skill was active
  rework_rate: float                # % of tasks that needed rework
  user_satisfaction: float|null     # if founder provides feedback

  # Model performance
  performance_by_model:
    haiku:
      quality_score: float
      avg_tokens: float
    sonnet:
      quality_score: float
      avg_tokens: float
    opus:
      quality_score: float
      avg_tokens: float

  # Comparison
  known_alternatives: [string]       # external skills that do the same thing
  best_alternative: string|null      # if an alternative outperforms
  replacement_candidate: boolean

  # Metadata
  last_measured_at: datetime
  measurement_period: string         # e.g., "last_30_days"

Connection to Dream Engine / Idle Maintenance

This feeds into the Idle Maintenance / Dream Engine: periodically compare internal skill performance against newly discovered community skills from the Research Center.

idle_maintenance_tasks:
  # ... existing items ...
  - skill_performance_review     # compare internal skill metrics against alternatives
  - skill_token_optimization     # identify skills with high token usage, suggest alternatives

Cross-reference: Idle maintenance task list is defined in Part 4, Section 4.9.9. The two tasks above should be added to that list. Dream Engine integration is defined in Part 5, Section 10.


8.10 Research Center Operational Registries (v5.0 Addition)

The Research Center requires two persistent registries to operate methodically rather than ad hoc.

These registries formalize what happened organically during the Day 0 session (2026-03-28), where the founder and the BrainMaster collaboratively scanned 15+ awesome-lists, evaluated 36+ tools, and populated the knowledge vault. That process should be repeatable and automated.

8.10.1 Source Registry

Tracks WHERE we look for tools, skills, libraries, and intelligence.

research_source:
  id: string
  name: string                    # e.g., "awesome-claude-code (hesreallyhim)"
  url: string                     # GitHub URL or web URL
  type: enum
    # github_repo | github_search | npm_registry | pypi_registry
    # hugging_face | web_article | social_media | newsletter
    # provider_docs | conference | community_forum

  # Quality signals
  ranking: int                    # 1-5 (5 = highest value source)
  stars: int|null                 # GitHub stars if applicable
  reliability: enum               # high | medium | low | unknown

  # Scan schedule
  scan_frequency: enum            # daily | weekly | biweekly | monthly | quarterly | on_demand
  last_scanned_at: datetime|null
  next_scan_at: datetime|null
  scan_agent_id: string|null      # which agent runs the scan

  # Results
  total_resources_found: int
  resources_adopted: int
  resources_evaluated: int

  # Metadata
  added_by: string                # user or agent who added the source
  notes: string|null
  topics: [string]                # what domains this source covers
  status: enum                    # active | paused | retired
  created_at: datetime
  updated_at: datetime
Known Sources (as of 2026-03-28)
Source Type Ranking Scan Freq Stars
awesome-claude-code (hesreallyhim) github_repo 5 monthly 33.6k
awesome-claude-skills (ComposioHQ) github_repo 5 monthly 48.8k
awesome-claude-code-subagents (VoltAgent) github_repo 4 monthly 15.5k
awesome-agentic-patterns (nibzard) github_repo 5 quarterly 4.1k
awesome-mcp-servers (appcypher) github_repo 4 monthly 5.3k
awesome-remote-mcp-servers (jaw9c) github_repo 4 monthly 1k
awesome-mcp-security (Puliczek) github_repo 3 quarterly 672
PyPI new packages (AI/agent category) pypi_registry 3 weekly
npm new packages (MCP/claude category) npm_registry 3 weekly
Hugging Face trending models hugging_face 3 monthly
Anthropic changelog / blog provider_docs 5 daily
Claude Code GitHub releases github_repo 5 daily
Scan Workflow
flowchart LR
    Schedule["Scan Schedule"] --> Agent["Research Agent"]
    Agent --> Source["Scan Source"]
    Source --> NewItems["Identify New Items"]
    NewItems --> Evaluate["Create Evaluation Notes"]
    Evaluate --> Vault["Knowledge Vault"]
    Vault --> Report["Report to Orchestrator"]
    Report --> Decision["User Decision"]

8.10.2 Resource Registry

Tracks WHAT we've found, evaluated, and decided about.

This is the structured equivalent of the knowledge vault's technology/ folder, but queryable as a database.

research_resource:
  id: string
  name: string                    # e.g., "pydantic-ai"
  type: enum
    # skill | plugin | cli_tool | mcp_server | framework | library
    # service | api | model | dataset | pattern | article

  # Discovery
  source_id: string               # which source we found it in
  discovered_at: datetime
  discovered_by: string           # agent or user
  url: string|null
  github_url: string|null
  stars: int|null

  # Evaluation
  status: enum
    # new | under_evaluation | adopted | deferred | rejected | deprecated
  ranking: int|null               # 1-5 (5 = critical value)
  evaluation_summary: string|null
  evaluation_date: datetime|null
  evaluated_by: string|null

  # Decision
  decision: enum|null
    # adopt | evaluate_further | defer | reject | deprecate
  decision_reason: string|null
  decision_by: string|null        # user or agent
  decision_date: datetime|null

  # Usage (if adopted)
  installed: boolean
  install_location: string|null   # where it's installed
  version_installed: string|null
  relevant_roles: [string]        # which roles use it (binds via role-topic-skill chain, not agent IDs)

  # Classification
  topics: [string]
  relevance_to: [string]          # which XIOPro components benefit
  model_tier: string|null         # haiku | sonnet | opus | any

  # Lifecycle
  last_reviewed_at: datetime|null
  review_cycle: enum|null         # monthly | quarterly | on_demand
  next_review_at: datetime|null

  # Links
  knowledge_vault_note: string|null  # path to Obsidian/vault note
  ticket_id: string|null            # if adoption created a ticket

  # Metadata
  comments: string|null
  tags: [string]
  created_at: datetime
  updated_at: datetime
Resource Lifecycle
flowchart LR
    New["new"] --> Evaluate["under_evaluation"]
    Evaluate --> Adopt["adopted"]
    Evaluate --> Defer["deferred"]
    Evaluate --> Reject["rejected"]
    Defer --> Evaluate
    Adopt --> Deprecate["deprecated"]
Resource Statistics (as of 2026-03-28)
Status Count Examples
Adopted 7 Socket.dev, Semgrep MCP, MCP Builder, GWS CLI, Stripe CLI, awesome-agentic-patterns, awesome-claude-skills lists
Under Evaluation 17 pydantic-ai, Firecrawl, OpenSpace, Supermemory, claude-deep-research, ccusage, claude-context...
Deferred 6 CrewAI, Jujutsu, CLI-Anything, NotebookLM Py, Competitive Ads, Lead Research
Rejected/Skipped 6 Phylum, Canopy, Rask Master AI, Agent Alchemy, Supabase CLI, Amplify

8.10.3 Research Center Process (Codified from Day 0 Experience)

The following process was proven during the Day 0 session and should be the canonical Research Center workflow:

research_center_process:

  1_source_management:
    - maintain Source Registry with ranked, scheduled sources
    - add new sources when discovered (user tips, agent finds, community)
    - retire sources that become stale or irrelevant
    - scan sources on schedule (daily/weekly/monthly per source ranking)

  2_discovery:
    - agent scans source for new/updated items
    - creates Resource Registry entries with status "new"
    - creates knowledge vault notes using standard template
    - reports discoveries to the orchestrator

  3_evaluation:
    - agent or user reviews each resource
    - assesses: what it does, relevance to XIOPro, maturity, cost, risk
    - updates Resource Registry with evaluation summary
    - categorizes: adopt / evaluate_further / defer / reject

  4_decision:
    - user makes final decision on high-impact resources
    - agent can auto-decide on clear skip/defer cases
    - decision recorded with reason in Resource Registry

  5_adoption:
    - adopted resources get installation ticket
    - installed on correct host (Hetzner/Mac)
    - added to skill registry if applicable
    - added to BP Part 2 (technology stack) if architectural

  6_review:
    - adopted resources reviewed on cycle (monthly/quarterly)
    - check: still maintained? still relevant? better alternatives?
    - deferred resources re-evaluated on schedule
    - deprecated resources removed and noted

Rule

The Research Center is not a one-time scan. It is a continuous intelligence operation that keeps XIOPro's technology stack current, discovers useful tools before we need them, and prevents the "we didn't know about X until it was too late" pattern.


8.11 External Research Source Integration

The Research Center should support governed use of external research sources.

Examples:

  • web research outputs
  • approved URLs / fetched references
  • provider documentation exports
  • Hugging Face research and repo discovery
  • benchmark/evaluation documents
  • local or remote CLI research tools

Hugging Face Note

For module and model scouting, the system may use Hugging Face as a governed research source for:

  • candidate model discovery
  • repository discovery
  • hosting clues
  • capability comparison
  • self-hosting research leads

However:

  • Hugging Face findings are inputs to evaluation, not automatic approvals
  • The module steward must still evaluate suitability
  • Rule steward / prompt steward / governor constraints still apply where relevant

8.12 Scheduled Research Tasks

Research should not depend only on manual ad hoc requests.

XIOPro should support scheduled or recurring research jobs such as:

  • recurring topic watch
  • competitor / market tracking
  • module portfolio refresh
  • Hugging Face model scouting
  • standards / regulation watch
  • literature or documentation refresh
  • periodic summary bundle generation
  • notebook refresh / insight digest

Typical triggers:

  • cron/scheduler policy
  • founder request
  • governance trigger
  • Dream / hindsight recommendation
  • module portfolio review cycle
  • project milestone

8.13 Research Task Definition

A research task should support at least:

research_task:
  id: string
  name: string
  topic_refs: [string]
  source_classes: [string]

  recurrence: string|null
  owner: string|null

  objective: string
  output_type: enum
  # digest | comparison | watchlist | synthesis | presentation | candidate_scan

  review_required: boolean
  destination_refs: [string]
  status: enum

This allows research work to become schedulable and auditable.


8.14 Research Output Types

The Research Center should be able to produce at least:

  • research digest
  • comparison matrix
  • source bundle
  • watchlist update
  • candidate module scan
  • founder briefing
  • NotebookLM-ready curated packet
  • Obsidian-ready linked note export
  • recommendation draft
  • knowledge-ingestion bundle

Each output should preserve:

  • source lineage
  • generation date
  • task objective
  • review status

8.15 Research Lifecycle

flowchart TD
    Discover --> CurateSources
    CurateSources --> RunResearchTask
    RunResearchTask --> Synthesize
    Synthesize --> Review
    Review --> PublishResearchOutput
    PublishResearchOutput --> IngestToLibrarian
    IngestToLibrarian --> RetrieveAndReuse

Notes

  • not every research result is automatically trusted
  • publication and ingestion are separate from raw collection
  • founder review may be required depending on scope/risk

8.16 Research Governance Rule

Research outputs may influence the system, but they must not silently redefine it.

If a research output proposes changes to:

  • rules
  • skills
  • activations
  • module portfolio
  • architecture
  • governance policy

it must route through the relevant governed approval path.


8.17 Success Criteria

The Research Center is successful when:

  • research is repeatable, not one-off
  • useful outputs are preserved and reusable
  • NotebookLM and Obsidian serve clear bounded roles
  • external sources are curated before trust is assigned
  • scheduled research reduces repeated manual effort
  • founder exploration strengthens, rather than fragments, system knowledge

9. Hindsight / Learning Engine

9.1 Role

System learning layer that converts execution history and research history into reusable lessons.

Hindsight is not a first-order governing authority.

It participates in governance by producing: - evidence - reflections - improvement proposals - research/task recommendations

Its outputs must flow through the governed paths handled by the governor, rule steward, prompt steward, and module steward roles where applicable.


9.2 Inputs

Inputs may include:

  • activities
  • results
  • evaluations
  • repeated failures
  • successful patterns
  • research outputs
  • module evaluation history
  • governance interventions

9.3 Output

reflection:
  issue: string
  root_cause: string
  improvement: string
  confidence: float

9.4 Flow

flowchart TD
    Activity --> Evaluation
    Evaluation --> Reflection
    ResearchOutput --> Reflection
    Reflection --> Improvement
    Improvement --> RulesUpdate
    Improvement --> SkillUpdate
    Improvement --> ResearchTaskProposal

9.5 Memory Engineering Principles — Hindsight Implementation Requirements

Hindsight must comply with the Memory Engineering Principles defined in Part 4, Section 4.8A. Specifically:

  • Principle 1 (Async Updates): Hindsight memory extraction must never block the agent's main execution. Extraction runs as a post-activity background process, not inline.
  • Principle 2 (Debounce Writes): Hindsight must batch memory extraction. Instead of extracting from every individual activity, wait until session end or a configurable threshold (30 seconds or N activities), then make one extraction call. This reduces LLM cost and produces higher-quality extractions.
  • Principle 3 (Confidence Threshold): Hindsight memories must carry a confidence score (0.0-1.0). Memories below 0.7 confidence are discarded. Total stored memories per agent are capped at 100; when the cap is reached, the lowest-confidence memories are trimmed.
  • Principle 4 (Token Budget): When Hindsight auto-injects memories into agent context, the injection must respect the 2000-token memory context budget. Trim lowest-confidence memories first until the budget fits.

9.6 Current Deployment Reference (v5.0 Addition)

Hindsight is currently deployed and running.

Current deployment details:

  • API endpoint: localhost:8888
  • Admin endpoint: localhost:9999
  • Backend: Vectorize.io Docker container
  • Memory footprint: 1.06 GB RAM
  • Status: Operational

The blueprint describes Hindsight's architectural role within XIOPro. The current deployment provides the foundation for the learning and reflection capabilities described in this part. T1P should build on the existing Hindsight deployment rather than replacing it.


10. Dream Engine Integration

10.1 Role

Idle-time learning and restructuring layer.

Within the knowledge domain, Dream may:

  • compress memory
  • extract patterns
  • identify recurring research gaps
  • suggest research task creation
  • suggest knowledge cleanup
  • suggest rule/skill/module review opportunities

10.2 Trigger

Typical triggers:

  • idle system
  • scheduled execution windows
  • low-load background periods
  • accumulated unresolved research backlog

10.3 Boundary

Dream may propose and prepare, but it does not bypass:

  • Librarian discipline
  • rule steward stewardship
  • module steward governance
  • approval requirements

Dream participates in governance as a proposal and optimization layer, not as a first-order governor.

It may generate signals, recommendations, and restructuring proposals, but enforcement and approval remain with the governing/steward roles.

Hindsight and Dream are second-order governance inputs, not first-order governing authorities.


11. Research Scheduling & Refresh Discipline

11.1 Purpose

Research must remain current without becoming noisy.

This section defines the discipline for recurring research refresh.


11.2 Refresh Classes

Useful recurring classes include:

  • daily digest
  • weekly watch
  • milestone refresh
  • quarterly portfolio scan
  • event-triggered refresh
  • founder-requested deep refresh

11.3 Anti-Patterns Prevented

This discipline prevents:

  • forgotten research threads
  • one-time research that is never revisited
  • fragmented external findings
  • stale module comparisons
  • untraceable NotebookLM/Obsidian artifacts
  • duplicated manual re-research

11.4 Final Rule

Research is a governed system capability.

It must be schedulable, reviewable, preservable, and reusable.


11A. T1P Operating Posture for Research, Hindsight & Dream

11A.1 Purpose

Research, Hindsight, and Dream remain core parts of XIOPro in T1P.

They are not removed.

But they must begin with an explicit operating posture so the blueprint preserves ambition without pretending all related subsystems are equally mature on day one.


11A.2 Research Center Posture

Posture: Active but Narrow

The Research Center must be real in T1P.

Minimum required T1P capabilities:

  • Research Task creation
  • scheduled or triggered research execution
  • source bundle association
  • Research Output preservation
  • lineage to sources
  • promotion into Librarian-managed knowledge or explicit draft retention

Not required for T1P:

  • a giant multi-surface research universe
  • many presentation/export paths all at once
  • rich fully-automated external research orchestration at scale

Rule

Research remains in scope for T1P, but the first research wave should prove the canonical research path rather than maximum breadth.


11A.3 Hindsight Posture

Posture: Proposal-Oriented

Hindsight must exist in T1P as a real learning and reflection engine.

Minimum required T1P capabilities:

  • generate reflections from execution history
  • generate improvement proposals
  • support research/task recommendations
  • preserve evidence and rationale for proposals

Hindsight must not in T1P:

  • silently publish protected changes
  • directly mutate live governance or behavior
  • bypass approval or stewardship paths

Rule

Hindsight is a second-order governance input, not a first-order governing authority.


11A.4 Dream Posture

Posture: Proposal-Oriented

Dream must remain in the architecture and in T1P, but with bounded authority.

Minimum required T1P capabilities:

  • idle-time pattern detection
  • cleanup/compression opportunities
  • proposal generation
  • recurring gap detection
  • research/rule/skill/module review suggestions

Dream must not in T1P:

  • directly rewrite protected live behavior
  • directly publish rule changes
  • directly alter module policy without governed approval
  • become a hidden autonomous control plane

Rule

Dream may prepare, compress, detect, and propose. It may not self-authorize protected operational change.


11A.5 Preservation Rule

XIOPro keeps:

  • Research Center
  • Hindsight
  • Dream

in T1P because they are part of the system's compounding-intelligence model.

The constraint is not removal.

The constraint is operating posture.


11A.6 First Proof Rule

Before these subsystems are expanded in breadth, T1P should prove at least:

  • one real Research Task path
  • one real Research Output preservation and promotion path
  • one real Hindsight reflection/proposal path
  • one real Dream proposal path

If these are not proven, further sophistication increases conceptual richness but not system credibility.


12. Indexing & Retrieval

Requirements

  • fast
  • topic-aware
  • semantic
  • low token usage

Strategy

  • DB indexing
  • vector layer (optional)
  • topic filtering
  • metadata filtering

12.1 Neo4j — Deprecated (v5.0 Update)

Neo4j — Deprecated

Neo4j was originally included (v3.2.1 blueprint) for: - Knowledge graph (BIM entity relationships, topic navigation) - Librarian routing service

Deprecated in v5.0 because: - Neither instance was actively used by any agent workflow - 1.83 GB RAM consumed for zero operational value - PostgreSQL with pgvector (v0.8.2) provides vector search within the primary database - PostgreSQL JSONB + recursive CTEs handle hierarchical relationships (topics, entities) - Full-text search covers document retrieval needs for T1P

Both Neo4j instances (graph_stack, librarian) have been stopped and removed from the server.

If graph traversal requirements emerge post-T1P (millions of interconnected BIM entities, complex multi-hop queries), Neo4j can be reintroduced. For T1P, PostgreSQL handles everything.


13. Knowledge Storage Strategy

DB

  • indexed entries
  • metadata
  • relationships

Git

  • source documents
  • rules
  • blueprints

Markdown

  • canonical human-readable form

14. Knowledge Lifecycle

flowchart TD
    Create --> Classify
    Classify --> Store
    Store --> Index
    Index --> Retrieve
    Retrieve --> Use
    Use --> Improve
    Improve --> Update

15. Anti-Entropy Rules

System must avoid:

  • duplicate documents
  • orphan knowledge
  • inconsistent naming
  • unindexed files
  • chat-only knowledge

16. Knowledge Cost Optimization

  • avoid full document loads
  • use summaries
  • index aggressively
  • retrieve selectively

17. Knowledge Success Criteria

Knowledge system is successful if:

  • information is findable instantly
  • agents reuse knowledge
  • system improves over time
  • duplication is minimized
  • documents remain structured

18. Current State (v5.0 Addition)

18.1 Hindsight

  • Status: Deployed and running
  • Endpoints: localhost:8888 (API), localhost:9999 (admin)
  • Backend: Vectorize.io Docker container
  • Memory: 1.06 GB RAM
  • Role: Learning and reflection engine for execution history

18.2 Neo4j (Deprecated)

  • Status: Removed -- both instances stopped and deleted
  • Previous footprint: 1.83 GB combined (graph_stack 1.2 GB, librarian 631 MB)
  • Replacement: PostgreSQL + pgvector for vector search, JSONB + recursive CTEs for hierarchical relationships, full-text search for document retrieval
  • Decision: Deprecated in v5.0. See Section 12.1 for full rationale.

18.3 Obsidian

  • Status: Not deployed (ticket 069 pending)
  • T1P dependency: None -- Librarian operates on Git-based markdown
  • Plan: Deploy when controlled-sync mirror pattern is ready

18.4 NotebookLM

  • Status: Not deployed
  • T1P dependency: None
  • Plan: Integrate as research presentation surface when research workflow is proven

19. Final Statement

Knowledge is the long-term memory of XIOPro .

If strong:

  • intelligence compounds
  • execution accelerates

If weak:

  • system forgets
  • cost explodes
  • chaos returns

Changelog

Version Date Author Changes
4.1.0 2026-03-26 BM Initial v4.1 release
4.2.0 2026-03-28 BM Added: Hindsight current deployment reference (9.5). Added: Neo4j evaluation note for T1P (12.1). Added: Obsidian current status -- not deployed (8.8). Added: NotebookLM current status -- not deployed (8.7). Added: Current State section (18) with deployment status for Hindsight, Neo4j, Obsidian, NotebookLM. Fixed: "Rufio" renamed to "Ruflo" globally. Added: Changelog section. Updated version header to 4.2.0.
4.2.2 2026-03-28 000 Agent naming migration: R01 replaced with rule steward role. O01 replaced with 000 (governor role). M01 replaced with module steward role. BM/B1-B5/M0 replaced with 3-digit IDs in Obsidian vault structure and skill registry. Changelog author entries preserved as historical.
4.2.4 2026-03-28 000 Section 8.9: Added cross-reference to Part 4 Section 4.11 (Skill Selection Architecture). Added category and min_model_tier fields to skill registry YAML. Added category comment block aligned with Section 4.11 skill library.
4.2.5 2026-03-28 000 Founder clarifications: (1) Research Domains section (8.2A) -- Research Center serves ALL knowledge domains (BIM, AI/LLM, market/business), not just XIOPro tech. NotebookLM clarified as primary research acceleration surface. (2) Skill Performance Database (8.9A) -- track token consumption, quality, model compatibility per skill; feeds Dream Engine idle maintenance. (3) Resource Registry used_by_agents replaced with relevant_roles per role-topic-skill binding chain (Part 4 Section 4.11).
4.2.6 2026-03-28 000 Roles over numbers: Removed agent IDs from architectural descriptions, section headers, and diagrams. Role names used throughout instead of agent numbers.
4.2.7 2026-03-28 BM Neo4j deprecated: Section 12.1 rewritten from evaluation note to deprecation notice with full rationale. Section 18.2 updated to "Removed" status. Vault file tree updated. Both instances stopped and deleted from server. PostgreSQL + pgvector replaces all Neo4j use cases for T1P.
4.2.8 2026-03-28 BM AGI pattern gap fix: Added T1P RAG Pipeline section (7.18) — embedding model, chunking strategy, hybrid retrieval, reranking, generation contract. Addresses audit gap "RAG Pipeline Specifics" (Principle 9).
4.2.9 2026-03-28 000 Wave 1-2 BP fixes: Added Section 4.5 Librarian Document Decomposition Protocol — extraction targets, note format, vault structure, decomposition flow, and rules for breaking large documents into linked atomic knowledge notes. Renumbered 4.5 Future Extensions to 4.6, 4.6 Knowledge Ledger to 4.7 (all sub-sections renumbered accordingly).
4.2.10 2026-03-28 000 Memory engineering principles: Added Section 4.5A (Librarian memory engineering requirements) and Section 9.5 (Hindsight memory engineering requirements) referencing Part 4 Section 4.8A. Added confidence field to Knowledge Schema (Section 6.1) with threshold and cap rules. Renumbered Hindsight Section 9.5 Current Deployment to 9.6. Slimmed Section 7.4 Rule Steward Role to cross-reference Part 4 Section 4.2A (removed duplicated responsibilities). Content deduplication: slimmed duplicate content throughout to cross-reference primary locations.
5.0.1 2026-03-30 GO N6: Added content_ref resolution note to Section 6.1 Knowledge Schema — format, vault root, resolution rule.