XIOPro Production Blueprint v5.0¶

Part 5 — Knowledge System¶

1. Purpose of This Part¶

This document defines the Knowledge System of XIOPro:

how knowledge is stored
how it is structured
how it is retrieved
how it evolves
how agents learn
how the system avoids entropy

This layer transforms XIOPro from:

an execution system

into:

a compounding intelligence system

2. Knowledge System Philosophy¶

XIOPro knowledge is:

structured (not raw memory)
indexed (not buried in chats)
evolving (not static)
agent-accessible (not human-only)
cost-efficient (token-aware)

3. Core Components¶

flowchart TD
    subgraph Sources["Sources"]
        Users["Users"]
        AgentOutput["Agents"]
        External["External"]
        DreamOut["Dream Engine"]
    end

    subgraph RC["Research Center"]
        SourceReg["Source Registry"]
        ResourceReg["Resource Registry"]
        Scheduled["Scheduled Research"]
        NLM["NotebookLM"]
        Obsidian["Obsidian"]
    end

    subgraph Core["Core Engine"]
        Librarian["Librarian"]
        SkillReg["Skill Registry"]
        SkillPerf["Skill Performance DB"]
        KLedger["Knowledge Ledger"]
    end

    subgraph Store["Storage"]
        Git["Git and Markdown"]
        PG["PostgreSQL and pgvector"]
    end

    subgraph Learn["Learning"]
        Hindsight["Hindsight"]
        Reflect["Reflection"]
        Improve["Improvement"]
    end

    Sources --> RC
    RC --> Librarian
    Librarian --> Store
    Store --> Core
    Core --> AgentOutput
    AgentOutput --> Learn
    Learn --> Librarian
    DreamOut --> RC
    DreamOut --> Learn

4. The Librarian (Core System)¶

Role¶

The Librarian is the central intelligence curator .

Responsibilities¶

ingesting all documents
structuring and classifying knowledge
maintaining consistency and naming discipline
indexing and retrieval
rendering documents for human and agent use
controlling knowledge lifecycle

The Librarian is not storage.

It is:

the operating system of knowledge

4.1 Librarian Decision Logic¶

4.1.1 Incoming Document (Ingestion Pipeline)¶

flowchart TD
    NewDoc --> CleanDocument
    CleanDocument --> CheckExisting
    CheckExisting -->|match| UpdateDoc
    CheckExisting -->|no match| NewDocCreate
    UpdateDoc --> Reindex
    NewDocCreate --> Reindex
    Reindex --> HandleMetadata
    HandleMetadata --> FindCorrectLocation
    FindCorrectLocation --> SaveDocument
    SaveDocument --> Available

Explanation¶

CleanDocument¶

normalize formatting
remove noise
ensure YAML compliance

CheckExisting¶

detect:
duplicates
version updates
partial overlaps

UpdateDoc vs NewDocCreate¶

Update → version increment
New → new identity assigned

Reindex¶

update search index
update topic linkage

HandleMetadata¶

enforce YAML structure
enrich metadata:
topics
tags
ownership
version

FindCorrectLocation¶

determine:
folder path
naming prefix (RULE_, BLUEPRINT_, etc.)
domain placement

SaveDocument¶

commit to:
Git (source)
DB (index + metadata)

4.1.2 Search Document¶

flowchart TD
    SearchOptions --> ByMetadataPrompt
    SearchOptions --> ByContentPrompt
    SearchOptions --> ByContextPrompt
    ByMetadataPrompt --> FindDocument
    ByContentPrompt --> FindDocument
    ByContextPrompt --> FindDocument

Search Modes¶

By Metadata¶

structured queries
fast
low token cost

By Content¶

semantic / text-based
deeper but more expensive

By Context¶

uses:
current task
current topic
agent context

4.1.3 Display or OutGoing Document¶

flowchart TD
    RequestDocument --> AddHeaderFooter
    AddHeaderFooter --> AddTOC
    AddTOC --> AddIndex
    AddIndex --> AddRelevantContext
    AddRelevantContext --> ChooseFormat
    ChooseFormat --> Markdown
    ChooseFormat --> PDF
    ChooseFormat --> HTML
    ChooseFormat --> GoogleDocs
    ChooseFormat --> OfficeDocs
    Markdown --> ChooseAction
    PDF --> ChooseAction
    HTML --> ChooseAction
    GoogleDocs --> ChooseAction
    OfficeDocs --> ChooseAction
    ChooseAction --> DisplayDocument
    ChooseAction --> DisplayDocument
    DisplayDocument --> |MarkdownOnly|EditDocument-Editor
    DisplayDocument --> |MarkdownOnly|EditDocument-Prompt
    EditDocument-Editor --> AddToLibrarian
    EditDocument-Prompt --> AddToLibrarian
    ChooseAction --> DownloadDocument
    ChooseAction --> EmailDocument

Rendering Logic¶

AddHeaderFooter¶

branding
copyright
metadata summary

AddTOC¶

dynamic table of contents

AddIndex¶

references
links to related knowledge

AddRelevantContext¶

inject:
related documents
linked topics
dependencies

Formats¶

Markdown → source of truth
HTML → UI rendering
PDF → distribution
Google Docs → collaboration
Office Docs → enterprise usage

Actions¶

Display
Edit (Markdown only)
Download
Email
Re-ingest into Librarian

4.2 Librarian Output Types¶

Markdown (source of truth)
HTML (human view)
Indexed entries (for retrieval)
Metadata (YAML)

4.3 Search Document¶

By Metadata - Using Prompt
By Text - Using Prompt
By Context - Using Prompt

4.4 Librarian as System Boundary¶

The Librarian sits between:

Knowledge creation
Knowledge storage
Knowledge retrieval
Knowledge presentation

It ensures:

no duplication
no orphan data
consistent structure
continuous evolution

4.5 Librarian Document Decomposition Protocol¶

The Librarian's core job is breaking large documents into linked atomic knowledge notes.

decomposition_protocol:
  input: large document (blueprint, research report, design doc)
  output: linked atomic notes in knowledge vault

  extraction_targets:
    concepts: "Named system capabilities (Control Bus, ODM, Optimizer)"
    entities: "ODM objects (User, Idea, Ticket, Host, Agent)"
    roles: "System roles (orchestrator, governor, specialist)"
    technologies: "Tools and platforms (PostgreSQL, Hindsight, Ruflo)"
    decisions: "Architecture decisions with rationale"
    processes: "Workflows and procedures (backup, spawning, skill selection)"

  note_format:
    - YAML frontmatter (type, source_document, related, tags)
    - Wikilinks [[concept_name]] to related notes
    - One concept per note (atomic)
    - Source reference back to original document section

  structure:
    concepts/: system capability notes
    entities/: ODM entity notes
    roles/: role definition notes
    technologies/: technology notes (already populated)
    decisions/: architecture decision records
    processes/: workflow descriptions

Decomposition Flow¶

When the Librarian receives a large document:

Scan -- identify extraction targets (concepts, entities, roles, technologies, decisions, processes)
Extract -- create one atomic note per target, with YAML frontmatter and wikilinks
Link -- ensure all notes cross-reference related notes using [[concept_name]] wikilinks
Place -- file each note in the correct vault subdirectory based on its type
Index -- update the knowledge ledger with creation events for each new note
Verify -- check for duplicate notes, missing links, and orphaned references

Rules¶

One concept per note. If a note covers two distinct concepts, split it.
Every note must link back to its source document section.
Every note must have YAML frontmatter with at least: type, source_document, related, tags.
The Librarian must check for existing notes before creating new ones (search-before-create).
Decomposition of protected documents (blueprints, rules) requires the same governance as any knowledge mutation.

4.5A Librarian Memory Engineering Requirements¶

The Librarian must comply with the Memory Engineering Principles defined in Part 4, Section 4.8A. Specifically:

Principle 1 (Async Updates): Librarian ingestion, indexing, and storage must not block agent execution. When an agent produces a knowledge artifact, the Librarian processes it asynchronously.
Principle 2 (Debounce Writes): When multiple documents arrive in rapid succession (e.g., during a research burst), the Librarian should batch ingestion rather than processing each document individually.
Principle 3 (Confidence Threshold): Knowledge Objects ingested by the Librarian should carry a confidence score. The Librarian enforces the 0.7 threshold and per-type caps to prevent unbounded knowledge growth.
Principle 5 (Atomic Writes): All Librarian writes to state files, ledger entries, and index updates must use the atomic write pattern (write to temp file, then rename).

4.6 Future Extensions¶

auto-refactoring documents
cross-document synthesis
contradiction detection
knowledge graph visualization
automated documentation improvement suggestions

4.7 Knowledge Ledger (Change & Evolution Log)¶

Purpose¶

The Knowledge Ledger (KL) is a system-wide immutable log of all knowledge transformations.

It tracks:

document creation
document updates
metadata changes
reclassification
movement across locations
document revival
document export / distribution
deletion (logical, never physical)

This ensures:

traceability
auditability
explainability
reproducibility

4.7.1 Why This Is Required¶

Without a ledger:

metadata becomes unreliable
knowledge evolution is invisible
agents cannot learn properly
debugging becomes impossible
compliance (future) is broken

With a ledger:

XIOPro becomes self-explainable over time

4.7.2 Ledger Structure¶

ledger_entry:
  id: string
  timestamp: datetime
  document_id: string

  action: enum
    - created
    - updated
    - reclassified
    - moved
    - indexed
    - revived
    - exported
    - deleted_logical

  actor:
    type: enum (agent, human, system)
    id: string

  change_summary: string

  metadata_before: object|null
  metadata_after: object|null

  content_hash_before: string|null
  content_hash_after: string|null

  related_entities:
    - topic_id
    - ticket_id
    - task_id

  notes: string|null

4.7.3 Ledger Flow¶

flowchart TD
    DocumentChange --> CaptureEvent
    CaptureEvent --> CreateLedgerEntry
    CreateLedgerEntry --> StoreLedger
    StoreLedger --> IndexLedger
    IndexLedger --> AvailableForQuery

4.7.4 Event Sources¶

Ledger entries are generated from:

Librarian ingestion pipeline
document edits (UI / RC / agents)
metadata updates
topic reassignment
exports (PDF, HTML, Docs)
re-ingestion after edits
Dream Engine updates

4.7.5 Document Revival Tracking¶

When a document is:

re-used after long inactivity
referenced by a new ticket
pulled into a new context

→ it is marked as:

action: revived

This enables:

tracking knowledge reuse
identifying high-value documents
prioritizing maintenance

4.7.6 Export Tracking¶

When a document is:

downloaded
emailed
exported to external format

→ ledger records:

action: exported
target: enum (pdf, html, google_docs, office_docs

4.7.7 Ledger Usage¶

By Agents¶

detect frequently updated docs
identify unstable knowledge
suggest improvements

By the Governor¶

detect:
excessive changes
instability
redundant updates

By UI¶

show:
document history
evolution timeline
change summaries

4.7.8 Metadata vs Data (Your Insight)¶

This is key:

Metadata is not enough — it must become data-backed history

The ledger ensures:

metadata is traceable
changes are reconstructable
state is explainable

4.7.9 Storage Strategy¶

DB (Primary)¶

ledger entries
indexed for queries

Optional¶

append-only log system (future)
event streaming (future)

4.7.10 Anti-Patterns Prevented¶

silent document overwrites
metadata drift
lost evolution history
untraceable changes
"why did this change?" ambiguity

4.7.11 Success Criteria¶

The Knowledge Ledger is successful if:

every document change is traceable
history can be reconstructed
agents can learn from change patterns
system behavior is explainable

4.7.12 Final Statement¶

The Knowledge Ledger transforms XIOPro from:

a system that stores knowledge

into:

a system that understands how its knowledge evolves

5. Topics System (Core Spine)¶

Purpose¶

Topics are the universal classification system .

Properties¶

hierarchical (tree)
relational (graph)
indexed
extensible

5.1 Topic Structure¶

flowchart TD
    Root --> ProductDomain
    Root --> ComplianceStandards
    ProductDomain --> Modules
    ComplianceStandards --> Roles
    ComplianceStandards --> Validation

5.2 Topic Functions¶

Topics drive:

agent responsibility
knowledge classification
search
UI navigation
project alignment

6. Knowledge Objects¶

Types¶

RULE
SKILL
BLUEPRINT
STATE
REFLECTION
PROFILE
LOG

6.1 Knowledge Schema¶

id: string
type: enum
topics: [topic_id]
content_ref: string
version: string
confidence: float          # 0.0-1.0 — see Part 4, Section 4.8A (Memory Engineering Principle 3)
created_at: datetime
updated_at: datetime
metadata: object

content_ref Resolution¶

content_ref is a file path relative to the knowledge vault root (~/STRUXIO_Workspace/struxio-knowledge/vault/).

Format: vault/<category>/<filename>.md

Example: vault/blueprints/BLUEPRINT_XIOPro_v5_Part5_Knowledge_System.md

Agents and the Librarian resolve content_ref by joining the vault root with the stored path. The vault root is never stored in the content_ref value itself — only the relative path is stored.

Confidence Score Rules (Memory Engineering Principle 3)¶

Every Knowledge Object carries a confidence score (0.0-1.0) per Part 4, Section 4.8A:

Facts extracted automatically by Hindsight or agents: confidence assigned by the extraction model
Facts entered by the user: confidence defaults to 1.0
Facts below 0.7 confidence are discarded at ingestion time
Per-type caps prevent unbounded growth (e.g., max 100 active facts per agent context)
When cap is reached, lowest-confidence items are trimmed first

7. Rules, Skills & Activation Stewardship¶

7.1 Asset Families¶

XIOPro must treat the following as first-class knowledge assets:

RULE_*
SKILL_*
ACTIVATION_* or agent activation files such as claude.md
PATTERN_*
PROTOCOL_*

These assets are not casual notes. They are behavior-shaping system assets.

7.2 Rules vs Skills vs Activations¶

RULE_¶

Defines:

constraints
obligations
boundaries
approval requirements
forbidden or required behavior

SKILL_¶

Defines:

reusable execution capability
procedure
method
template for recurring work

ACTIVATION_¶

Defines:

agent-specific working mode
operating preferences
persistent instructions
behavior shaping for a named runtime or profession

Example:

claude.md for a specific agent runtime

PATTERN_¶

Defines:

reusable operating structures
standard workflows
repeatable multi-step methods

7.3 Why Stewardship Is Required¶

As XIOPro evolves, the system will continuously need:

new skills
revised rules
activation tuning
consolidation of duplicates
retirement of obsolete assets

Without stewardship, the rule/skill layer becomes:

conflicting
repetitive
hard to search
unsafe to modify
costly to maintain

7.4 Rule Steward Role¶

See Part 4, Section 4.2A for the full Rule Steward Role specification (responsibilities, non-responsibilities, managed asset classes, operating modes, stewardship flow, and relation to other components).

Within the Knowledge System, the rule steward is the role responsible for the lifecycle quality of the behavior-shaping assets defined in Section 7.1-7.3 above.

7.5 Technology Model¶

For T1P, the rule/skill system should use two synchronized forms.

Human-Readable Source of Truth¶

Stored in Git as Markdown assets with structured metadata.

Recommended minimum front matter:

id: string
asset_type: enum
# RULE | SKILL | ACTIVATION | PATTERN | PROTOCOL

name: string
owner: string|null
status: enum
# draft | review | approved | active | deprecated | archived

scope: [string]
applies_to: [string]
precedence: int|null
approval_required: boolean
version: string
supersedes: [string]
conflicts_with: [string]
created_at: datetime
updated_at: datetime

Structured Runtime Mirror¶

Normalized YAML/DB representation used for:

validation
querying
conflict detection
policy evaluation
lineage tracking
approval workflow

The Markdown asset remains the human-readable canonical source. The structured mirror makes it machine-usable.

7.6 Validation Pipeline¶

Every new or changed rule/skill/activation should pass through:

existence search
schema validation
metadata validation
conflict and overlap detection
effectiveness review where signals exist
approval determination
publication and indexing

flowchart TD
    Proposal --> SearchExisting
    SearchExisting --> ValidateSchema
    ValidateSchema --> ValidateMetadata
    ValidateMetadata --> DetectConflicts
    DetectConflicts --> EvaluateEffectiveness
    EvaluateEffectiveness --> ApprovalDecision
    ApprovalDecision --> Publish
    Publish --> Reindex

7.7 Usage Flow¶

flowchart TD
    Task --> Agent
    Agent --> SearchRelevantAssets
    SearchRelevantAssets --> RuleSelection
    SearchRelevantAssets --> SkillSelection
    SearchRelevantAssets --> ActivationBinding
    RuleSelection --> Execution
    SkillSelection --> Execution
    ActivationBinding --> Execution
    Execution --> Result
    Result --> Reflection
    Reflection --> RuleStewardReview

7.8 Skill Discovery & Creation Loop¶

XIOPro must support the creation of new skills over time.

The rule is:

search existing assets first
reuse if sufficient
extend if close fit exists
create new only when gap is real

New skills may be drafted by Claude or other approved agent surfaces, but they must still pass through rule steward stewardship and approval policy.

Typical triggers¶

repeated manual workaround
recurring task pattern
repeated failure due to missing procedure
founder request
Dream Engine recommendation
postmortem / hindsight finding

7.9 Activation Governance¶

Activation assets such as claude.md are governed artifacts.

They must support:

owner
scope
version
approval requirement
compatibility notes
performance review history
deprecation path

An activation file must never be treated as an unmanaged side document.

7.10 Conflict & Supersession Rules¶

Each governed asset should be able to declare:

supersedes
conflicts_with
replaced_by
derived_from

If conflict exists and cannot be resolved automatically, The rule steward must open a governed review or approval flow.

7.11 Lifecycle¶

Recommended lifecycle:

asset_lifecycle:
  - draft
  - review
  - approved
  - active
  - deprecated
  - archived

Notes:

draft = proposed, not trusted
review = under evaluation
approved = accepted for use
active = currently in force / use
deprecated = retained but should no longer be selected by default
archived = retained history only

7.12 Search & Retrieval Requirement¶

The knowledge system must support finding rules/skills/activations by:

exact ID
name
topic
owner
asset type
scope
related task type
conflict/supersession relation
status

This is required so the system can reuse before it rewrites.

7.13 Module Portfolio Knowledge Layer¶

The knowledge system must also govern the module portfolio layer.

This includes knowledge about:

commercial modules
subscription-backed access paths
API-backed access paths
local/self-hosted modules
hosting environments
evaluation history
recommendation history
deprecation and replacement lineage

These are not runtime-only facts.

They are durable intelligence assets that must be queryable and reviewable.

7.14 Module Asset Classes¶

Recommended governed asset classes:

module_asset_classes:
  - MODULE
  - MODULE_POLICY
  - SUBSCRIPTION
  - HOSTING_PROFILE
  - MODULE_EVALUATION
  - MODULE_RECOMMENDATION

Meaning¶

MODULE¶

Defines a specific module or model option.

Example properties:

id: string
provider: string
module_name: string
access_modes: [string]
# subscription | api_key | local | hosted_self_managed

status: enum
# candidate | approved | active | constrained | deprecated | archived

quality_notes: [string]
latency_tier: string|null
cost_tier: string|null
privacy_posture: string|null
fallback_modules: [string]

MODULE_POLICY¶

Defines where and how a module may be used.

Example properties:

id: string
module_id: string
allowed_task_classes: [string]
forbidden_task_classes: [string]
allowed_surfaces: [string]
approval_required_for_use: boolean
notes: [string]

SUBSCRIPTION¶

Defines a commercial access plan or account-bound capability.

Example properties:

id: string
provider: string
plan_name: string
scope: string|null
capabilities: [string]
limitations: [string]
quota_notes: [string]
status: enum

HOSTING_PROFILE¶

Defines an environment profile for local/server/self-hosted viability.

Example properties:

id: string
environment_type: enum
# mac_local | linux_server | cloud_gpu | cloud_cpu | hybrid

compute_notes: [string]
memory_notes: [string]
storage_notes: [string]
network_notes: [string]
security_notes: [string]
compatibility_notes: [string]

MODULE_EVALUATION¶

Stores structured evaluation history for a module candidate or active option.

Typical fields:

task class tested
quality observations
latency observations
cost observations
stability observations
trust / reliability notes
hosting observations
recommendation outcome

MODULE_RECOMMENDATION¶

Stores proposed portfolio actions such as:

adopt
prefer
constrain
retire
self-host
compare further
reject for now

7.15 Module Portfolio Search Requirement¶

The knowledge system must support finding module assets by:

provider
module name
access mode
subscription availability
hosting profile
task fit
latency tier
cost tier
status
fallback relation
recommendation status
replacement / supersession relation

This is required so the module steward can optimize through searchable evidence, not memory fragments.

7.16 Module Evaluation & Recommendation Loop¶

Recommended loop:

flowchart TD
    NeedDetected --> SearchPortfolio
    SearchPortfolio --> ExistingFit
    ExistingFit -->|sufficient| RecommendUse
    ExistingFit -->|insufficient| CompareCandidates
    CompareCandidates --> EvaluateQuality
    CompareCandidates --> EvaluateCost
    CompareCandidates --> EvaluateStability
    CompareCandidates --> EvaluateHostingFit
    EvaluateQuality --> Recommendation
    EvaluateCost --> Recommendation
    EvaluateStability --> Recommendation
    EvaluateHostingFit --> Recommendation
    Recommendation --> ApprovalDecision
    ApprovalDecision --> PortfolioUpdate

Rule¶

The system must prefer:

reuse
comparison
constrained recommendation

before creating unnecessary new module dependence.

7.17 Optimization Record Requirement¶

Every meaningful module recommendation should preserve the optimization rationale.

It should be possible to answer later:

why this module was preferred
what tradeoffs were accepted
what resource constraints mattered
what fallback was defined
what hosting assumptions were required
why a subscription or self-hosting proposal was or was not approved

This is necessary for:

trust
auditability
future re-evaluation
Dream / hindsight learning
portfolio optimization over time

7.18 T1P RAG Pipeline (v5.0.8 Addition)¶

XIOPro uses retrieval-augmented generation to supply agents with relevant knowledge context. This section specifies the T1P RAG pipeline design.

rag_pipeline:
  embedding_model: "text-embedding-3-small (OpenAI) or BGE-M3 (self-hosted)"
  vector_store: "pgvector (PostgreSQL extension, v0.8.2, already installed)"
  chunking_strategy:
    method: "recursive character splitting"
    chunk_size: 1000
    chunk_overlap: 200
    metadata_preserved: [source_file, section, topic_id, document_type]
  retrieval:
    method: "hybrid (vector similarity + full-text BM25)"
    top_k: 10
    reranking: "optional — FlashRank or Cohere rerank if quality insufficient"
    quality_metric: "relevance score threshold > 0.7"
  generation:
    context_injection: "retrieved chunks injected as system context"
    citation_required: true
    hallucination_guard: "verify claims against retrieved chunks"

Design Rationale¶

pgvector handles vector search within PostgreSQL. No separate vector DB needed for T1P. This aligns with the single-database posture (Part 2, Section 5.5) and avoids operational overhead.
Hybrid retrieval (vector similarity + BM25 full-text) covers both semantic and keyword-exact matches. PostgreSQL's built-in tsvector provides BM25-equivalent ranking alongside pgvector cosine similarity.
Chunking uses recursive character splitting to respect section boundaries in Markdown knowledge assets. Metadata preservation ensures retrieved chunks can be traced back to source documents and topics.
Embedding model choice is deferred until Research Center actively needs it. For T1P, Hindsight handles agent memory retrieval using its own embedding pipeline. When the Librarian requires semantic search across governed knowledge, the embedding model will be selected based on cost/quality evaluation by the Module Steward.
Reranking is optional at T1P. If retrieval quality (measured by relevance score threshold) is insufficient with hybrid search alone, FlashRank (self-hosted, zero cost) or Cohere rerank (API, low cost) can be added as a post-retrieval filter.
Context window budget: retrieved chunks are injected as system context. The Prompt Steward (Part 4, Section 4.2B) manages total context budget, ensuring RAG chunks do not crowd out task instructions or conversation history.

T1P Scope¶

For T1P, the RAG pipeline operates on:

governed knowledge assets (rules, skills, activation files)
Librarian-indexed documents
project-scoped knowledge (per project_id)

Full semantic search across Research Center outputs, Obsidian vault, and external sources is deferred to post-T1P.

8. Research Center¶

8.1 Purpose¶

XIOPro needs a unified Research Center, not a collection of disconnected research-related tools.

The Research Center is the governed layer that coordinates:

the Librarian
NotebookLM
Obsidian
scheduled research tasks
external research sources
curated research outputs
founder-facing research workflows

Its purpose is to transform research from ad hoc querying into a repeatable system capability.

8.2 Role in XIOPro¶

The Research Center is responsible for:

collecting and curating research inputs
running scheduled or triggered research workflows
producing usable research outputs
preserving research lineage
reducing repeated research effort
turning external findings into indexed internal knowledge
supporting founder exploration without losing system structure

The Research Center is not the same as the Librarian.

The Librarian is the knowledge operating system.

The Research Center is the research workflow and synthesis layer built on top of that knowledge foundation.

8.2A Research Domains (v5.0.5 Clarification)¶

The Research Center serves ALL knowledge domains, not just XIOPro technology. Any domain that informs founder decisions, system design, or market positioning is in scope.

research_domains:
  devxio_technology:
    description: "Tools, skills, frameworks, MCP servers for XIOPro itself"
    sources: [awesome-lists, GitHub, npm, PyPI, HuggingFace]
    scan_frequency: weekly-monthly

  product_domain:
    description: "Product-specific domain knowledge (see MVP1_PRODUCT_SPEC.md for first product)"
    sub_domains:
      - industry_standards: "Relevant compliance standards and updates"
      - market_players: "Competitors, consultancies, tech vendors"
      - domain_tech: "Competing platforms, new tools, market trends"
      - regulatory: "Regulatory changes, compliance updates"
    sources: [industry_publications, standards_bodies, competitor_sites]
    scan_frequency: monthly-quarterly

  ai_and_llm_landscape:
    description: "LLM providers, model releases, pricing, capabilities"
    sub_domains:
      - model_releases: "New Claude, GPT, Gemini, open-weight models"
      - agent_frameworks: "CrewAI, pydantic-ai, LangGraph, etc."
      - pricing_changes: "API costs, subscription changes"
    sources: [provider_blogs, HuggingFace, benchmarks]
    scan_frequency: weekly

  market_and_business:
    description: "Customer research, competitors, pricing, go-to-market"
    sub_domains:
      - competitors: "Procore, PlanRadar, Dalux, BIM 360 etc."
      - customers: "Target personas, industry trends"
      - pricing: "Market rates, willingness to pay"
    sources: [industry_reports, competitor_sites, LinkedIn, conferences]
    scan_frequency: quarterly

NotebookLM as Research Acceleration Surface¶

NotebookLM is a critical Research Center integration -- not just for document synthesis, but for:

Smart prompting of research questions
Deep research across curated source bundles
Voice overview generation for founder consumption
Multi-source synthesis and comparison
Presentation-ready research outputs

When deployed, NotebookLM becomes the primary research acceleration surface for complex, multi-source research tasks across ALL domains -- not just technology.

8.3 Core Principle¶

Research should move through a governed flow:

source discovery → collection → curation → synthesis → storage → retrieval → reuse

The system must distinguish between:

raw source material
curated research bundles
generated summaries or overviews
approved internal knowledge

8.4 Core Components¶

The Research Center coordinates at least these components:

Librarian
NotebookLM
Obsidian
external research connectors
scheduled research jobs
research task definitions
research output store
research review / approval flow

flowchart TD
    Sources[Research Sources] --> Intake[Research Intake]
    Intake --> Curate[Curate / Normalize]
    Curate --> Librarian[Librarian]
    Librarian --> ResearchTasks[Research Tasks / Schedules]
    ResearchTasks --> NotebookLM[NotebookLM]
    ResearchTasks --> Obsidian[Obsidian]
    ResearchTasks --> Synthesis[Synthesis / Comparison / Reports]
    Synthesis --> Outputs[Research Outputs]
    Outputs --> Librarian
    Outputs --> Founder[Founder / Control Center]
    Outputs --> KnowledgeUse[Future Retrieval / Reuse]

8.5 Research Input Classes¶

The Research Center should handle multiple input classes:

Internal Knowledge Inputs¶

blueprint parts
rules
skills
activations
historical decisions
evaluations
prior research outputs

Connected / Curated Source Inputs¶

uploaded files
curated documents
internal notes
founder research packets
approved web captures or exports

External Research Inputs¶

web research outputs
module/provider references
Hugging Face model and repo research
benchmark/evaluation notes
self-hosted model comparison material
research exports from approved tools

External inputs must be curated before they become trusted internal knowledge.

8.6 Librarian Integration¶

The Librarian remains the authority for:

ingestion discipline
naming discipline
indexing
topic assignment
storage routing
retrieval support
lifecycle control

The Research Center depends on the Librarian for structured persistence.

Rule:

Research outputs are not complete until they are either:

ingested by the Librarian, or
explicitly marked as transient / draft

8.7 NotebookLM Integration¶

Role¶

NotebookLM is used as a research presentation and synthesis surface for tasks such as:

voice overview
video overview
summaries
presentation generation
thematic synthesis across curated source packets

Boundary¶

NotebookLM is not the source of truth.

It is a research acceleration and output layer.

Allowed Pattern¶

XIOPro selects or prepares curated research bundles
NotebookLM generates overviews, synthesis, or presentation outputs
Librarian stores approved outputs and references

Rule¶

NotebookLM outputs must preserve lineage to the source bundle used.

Current Status (v5.0 Addition)¶

NotebookLM is not deployed. It remains in the architecture as a planned integration surface. T1P does not depend on NotebookLM availability. When deployed, it should follow the integration pattern described above.

8.8 Obsidian Integration¶

Role¶

Obsidian is the living knowledge companion — a linked, navigable, human-friendly surface that makes XIOPro's knowledge visible, explorable, and enrichable.

It is not just a vault. It is an active part of the knowledge workflow.

What Lives in Obsidian¶

Obsidian should contain a linked, navigable mirror of:

Architecture and Design¶

XIOPro blueprint (all 9 parts, linked by cross-references)
Architecture decisions and their rationale
System capability map
ODM entity relationships

Technology Evaluations¶

Every open-source tool evaluated (positive AND negative outcomes)
Example: Phylum (acquired by Veracode, free tier discontinued — decision: use Socket.dev instead)
Example: Ruflo vs Bus analysis leading to Control Bus architecture
Evaluation template: what it is, why we considered it, outcome, date

Work History¶

Sprint retrospectives
Key decisions and why they were made
Incidents and root causes
Pattern: what worked, what didn't

Research¶

Research task outputs (linked to tickets)
Curated external references
Competitor analysis
Domain knowledge (product-specific -- see MVP1_PRODUCT_SPEC.md for first product)

Agent Knowledge¶

Agent activation files (linked)
Rules and skills registry (linked)
Lessons learned per agent

Obsidian Vault Structure¶

STRUXIO_Obsidian_Vault/
  INDEX.md                          # Master index with links to all sections
  architecture/
    XIOPro_Blueprint_Overview.md    # Links to all 9 parts
    Architecture_Decisions.md       # ADR-style linked notes
    Control_Bus.md                  # Design rationale, evolution
    ODM_Entity_Map.md              # Entity relationships
  technology/
    _Technology_Index.md            # All evaluations
    Phylum.md                       # Evaluated → rejected (Veracode acquisition)
    Socket_dev.md                   # Evaluated → adopted (supply chain security)
    Ruflo.md                        # In use — agent runtime
    LiteLLM.md                      # In use — model router
    Hindsight.md                    # In use — memory system
    Neo4j.md                        # Deprecated — removed (see Section 12.1)
    Jujutsu_jj.md                   # Evaluated → deferred
    [every tool we evaluate gets a note]
  work/
    Sprint_S001.md                  # Retrospective
    Incidents/                      # Root cause notes, linked to fixes
    Decisions/                      # Key decisions with rationale
  research/
    product_domain/                 # Domain knowledge (see MVP1_PRODUCT_SPEC.md)
    competitors/
  agents/
    000_BrainMaster.md              # Agent profile, lessons, patterns
    001_Compliance.md
    002_Engineering.md
    ...

Sync Model¶

Obsidian vault lives on Mac Studio. Sync with XIOPro via:

Git-based sync — vault is a Git repo or symlinked to design repo sections
Agent-to-Obsidian — when the BrainMaster or a domain brain produces a decision, evaluation, or lesson, the Librarian (or a scheduled job) creates/updates the corresponding Obsidian note
Obsidian-to-XIOPro — founder creates notes in Obsidian during thinking/research. Notes marked #promote are ingested by the Librarian into governed knowledge.
Blueprint sync — when BP parts are updated, corresponding Obsidian architecture notes are updated automatically

Technology Evaluation Template¶

Every tool/library/service we evaluate gets an Obsidian note:

---
name: [Tool Name]
type: technology_evaluation
status: adopted | rejected | deferred | under_evaluation
date_evaluated: YYYY-MM-DD
evaluated_by: [agent or founder]
---

## What It Is
[One paragraph description]

## Why We Considered It
[The problem it would solve for XIOPro]

## Evaluation
[Findings — capabilities, limitations, pricing, maturity]

## Decision
[Adopted / Rejected / Deferred — with clear reason]

## Links
- [Official site]
- [GitHub]
- [Related XIOPro ticket if any]

Boundary¶

Obsidian may mirror and enrich knowledge, but it must not silently become a second uncontrolled source of truth.

Rules: - Git repos remain source of truth for code, blueprints, and state - PostgreSQL remains source of truth for operational data (ODM) - Obsidian is a navigation and enrichment layer, not an authoritative store - Notes promoted from Obsidian to governed knowledge go through Librarian discipline

Current Status (v5.0 Addition)¶

Obsidian is now being set up on Mac Studio (Mac Worker task, 2026-03-28). Vault location: ~/STRUXIO_Workspace/STRUXIO_Obsidian_Vault/ Initial seed: domain wiki + blueprint parts. Next: populate technology evaluation notes, link architecture decisions.

STRUXIO_Knowledge Repository (v5.0 Addition)¶

A dedicated Git repository for governed knowledge assets that syncs with the Obsidian vault.

repo: STRUXIO-ai/struxio-knowledge
purpose: Central knowledge store — syncs with Obsidian vault, feeds Librarian
contains:
  - architecture decisions (ADRs)
  - technology evaluations (adopted, rejected, deferred)
  - domain knowledge (product-specific -- see MVP1_PRODUCT_SPEC.md)
  - research outputs
  - agent lessons and patterns
  - sprint retrospectives
  - incident postmortems

sync_model:
  obsidian_to_git: "Founder edits in Obsidian → committed to struxio-knowledge"
  git_to_obsidian: "Agent-produced knowledge → appears in Obsidian vault"
  librarian_ingest: "Promoted notes → governed knowledge via Librarian"

Relationship to Other Repos¶

Repo	Contains	Knowledge Role
struxio-knowledge	Governed knowledge, evaluations, decisions, research	Knowledge source of truth
struxio-design	Architecture blueprints, product design, UX specs	Design documents (may promote to knowledge)
struxio-logic	Agent activations, rules, skills	Behavioral assets (rule steward governs)
STRUXIO_OS	State, tickets, engineering, infra	Operational state
struxio-app	Product code	Codebase
struxio-business	Business documents	Business context

Rule¶

struxio-knowledge is the canonical home for knowledge that outlives a single sprint, ticket, or conversation. If something is worth keeping, it belongs here — not buried in a design doc or chat transcript.

8.9 Skill Registry and Governance (v5.0 Addition)¶

Cross-reference: Skill selection logic (which skills to load for a given task assignment) is defined in Part 4, Section 4.11 — Skill Selection Architecture. This section defines the registry; Section 4.11 defines the selection filter.

Problem¶

Skills are currently: - Defined as individual SKILL.md files in ~/.claude/skills/ - Referenced by name in every agent activation file (ACTIVATE_BM.md, ACTIVATE_B1.md, etc.) - Not centrally indexed or versioned - Difficult to maintain — changing a skill name means editing every activation file

This violates the "single source of truth" principle. The rule steward should govern skills centrally.

Solution: Central Skill Registry¶

A single SKILL_REGISTRY.yaml file that: - Lists all active skills with metadata - Maps skills to agents (which agents use which skills) - Tracks versions and status - Lives in struxio-logic/skills/ (source of truth) - Activation files reference the registry, not individual skills

# struxio-logic/skills/SKILL_REGISTRY.yaml
skills:
  - id: paperclip-sync
    name: "Paperclip Sync"
    path: ~/.claude/skills/paperclip-sync/SKILL.md
    version: "1.0.0"
    status: active
    used_by: [000, 001, 002, 003, 004, 005, 010]
    triggers: [/paperclip, /sync, /ticket]
    description: "Sync ticket status with Paperclip issue tracker"

  - id: writing-plans
    name: "Writing Plans"
    path: ~/.claude/skills/writing-plans/SKILL.md
    version: "1.0.0"
    status: active
    used_by: [000, 001, 002, 003, 004, 005]
    triggers: [/write-plan]
    description: "Write structured implementation plans"

  - id: systematic-debugging
    name: "Systematic Debugging"
    path: ~/.claude/skills/systematic-debugging/SKILL.md
    version: "1.0.0"
    status: active
    category: engineering        # See Section 4.11 for categories
    min_model_tier: sonnet       # Minimum model tier (haiku/sonnet/opus)
    used_by: [000, 002, 005]
    triggers: [/debug]
    description: "Debug bugs systematically before proposing fixes"

  # ... all other active skills

# Skill categories (aligned with Section 4.11 Skill Selection Architecture):
#   execution:      any model — paperclip-sync, verification-before-completion, etc.
#   engineering:    sonnet+   — TDD, systematic-debugging, code-review, etc.
#   architecture:   sonnet/opus — brainstorming, writing-plans, executing-plans, etc.
#   infrastructure: any model — hooks-automation, swarm-orchestration, etc.
#   knowledge:      sonnet+   — writing-skills, skill-builder, reasoningbank-agentdb, etc.
#   domain:         varies    — sparc-methodology, github-*, flow-nexus-*, claude-api

Activation File Pattern¶

Instead of listing skills inline, activation files reference the registry:

## Skills
Load skills from SKILL_REGISTRY.yaml for this agent's role.
Registry: struxio-logic/skills/SKILL_REGISTRY.yaml

Rule Steward Responsibilities for Skills¶

The rule steward must: - Maintain the registry as source of truth - Search for existing skills before creating new ones - Validate skill metadata and structure - Detect duplicate or conflicting skills - Propose skill consolidation or deprecation - Update registry when skills are added/changed/removed - Ensure all agents reference registry, not hardcoded skill names

Skill in Obsidian¶

Each skill should have a corresponding Obsidian note with: - What it does - Which agents use it - When it was last updated - Link to the SKILL.md source

T1P Priority¶

This is "Active but Narrow" for T1P: - Create the registry file - Migrate existing skills into it - Update activation files to reference registry - Full rule steward automation deferred

8.9A Skill Performance Database (v5.0.5 Addition)¶

Skills are not equal. They differ in: - Token consumption (some skills use 2x the tokens for the same result) - Result quality (measured by task completion rate, rework rate) - Model compatibility (some skills work poorly on Haiku) - Execution time

The system must track skill performance to: - Detect when a new external skill outperforms an existing one - Optimize token spend by preferring efficient skills - Retire underperforming skills - Compare internal skills against community alternatives

skill_performance_record:
  skill_id: string

  # Usage metrics (rolling)
  total_invocations: int
  avg_tokens_per_invocation: float
  avg_execution_time_ms: float

  # Quality metrics
  task_completion_rate: float       # % of tasks completed when this skill was active
  rework_rate: float                # % of tasks that needed rework
  user_satisfaction: float|null     # if founder provides feedback

  # Model performance
  performance_by_model:
    haiku:
      quality_score: float
      avg_tokens: float
    sonnet:
      quality_score: float
      avg_tokens: float
    opus:
      quality_score: float
      avg_tokens: float

  # Comparison
  known_alternatives: [string]       # external skills that do the same thing
  best_alternative: string|null      # if an alternative outperforms
  replacement_candidate: boolean

  # Metadata
  last_measured_at: datetime
  measurement_period: string         # e.g., "last_30_days"

Connection to Dream Engine / Idle Maintenance¶

This feeds into the Idle Maintenance / Dream Engine: periodically compare internal skill performance against newly discovered community skills from the Research Center.

idle_maintenance_tasks:
  # ... existing items ...
  - skill_performance_review     # compare internal skill metrics against alternatives
  - skill_token_optimization     # identify skills with high token usage, suggest alternatives

Cross-reference: Idle maintenance task list is defined in Part 4, Section 4.9.9. The two tasks above should be added to that list. Dream Engine integration is defined in Part 5, Section 10.

8.10 Research Center Operational Registries (v5.0 Addition)¶

The Research Center requires two persistent registries to operate methodically rather than ad hoc.

These registries formalize what happened organically during the Day 0 session (2026-03-28), where the founder and the BrainMaster collaboratively scanned 15+ awesome-lists, evaluated 36+ tools, and populated the knowledge vault. That process should be repeatable and automated.

8.10.1 Source Registry¶

Tracks WHERE we look for tools, skills, libraries, and intelligence.

research_source:
  id: string
  name: string                    # e.g., "awesome-claude-code (hesreallyhim)"
  url: string                     # GitHub URL or web URL
  type: enum
    # github_repo | github_search | npm_registry | pypi_registry
    # hugging_face | web_article | social_media | newsletter
    # provider_docs | conference | community_forum

  # Quality signals
  ranking: int                    # 1-5 (5 = highest value source)
  stars: int|null                 # GitHub stars if applicable
  reliability: enum               # high | medium | low | unknown

  # Scan schedule
  scan_frequency: enum            # daily | weekly | biweekly | monthly | quarterly | on_demand
  last_scanned_at: datetime|null
  next_scan_at: datetime|null
  scan_agent_id: string|null      # which agent runs the scan

  # Results
  total_resources_found: int
  resources_adopted: int
  resources_evaluated: int

  # Metadata
  added_by: string                # user or agent who added the source
  notes: string|null
  topics: [string]                # what domains this source covers
  status: enum                    # active | paused | retired
  created_at: datetime
  updated_at: datetime

Known Sources (as of 2026-03-28)¶

Source	Type	Ranking	Scan Freq	Stars
awesome-claude-code (hesreallyhim)	github_repo	5	monthly	33.6k
awesome-claude-skills (ComposioHQ)	github_repo	5	monthly	48.8k
awesome-claude-code-subagents (VoltAgent)	github_repo	4	monthly	15.5k
awesome-agentic-patterns (nibzard)	github_repo	5	quarterly	4.1k
awesome-mcp-servers (appcypher)	github_repo	4	monthly	5.3k
awesome-remote-mcp-servers (jaw9c)	github_repo	4	monthly	1k
awesome-mcp-security (Puliczek)	github_repo	3	quarterly	672
PyPI new packages (AI/agent category)	pypi_registry	3	weekly	—
npm new packages (MCP/claude category)	npm_registry	3	weekly	—
Hugging Face trending models	hugging_face	3	monthly	—
Anthropic changelog / blog	provider_docs	5	daily	—
Claude Code GitHub releases	github_repo	5	daily	—

Scan Workflow¶

flowchart LR
    Schedule["Scan Schedule"] --> Agent["Research Agent"]
    Agent --> Source["Scan Source"]
    Source --> NewItems["Identify New Items"]
    NewItems --> Evaluate["Create Evaluation Notes"]
    Evaluate --> Vault["Knowledge Vault"]
    Vault --> Report["Report to Orchestrator"]
    Report --> Decision["User Decision"]

8.10.2 Resource Registry¶

Tracks WHAT we've found, evaluated, and decided about.

This is the structured equivalent of the knowledge vault's technology/ folder, but queryable as a database.

research_resource:
  id: string
  name: string                    # e.g., "pydantic-ai"
  type: enum
    # skill | plugin | cli_tool | mcp_server | framework | library
    # service | api | model | dataset | pattern | article

  # Discovery
  source_id: string               # which source we found it in
  discovered_at: datetime
  discovered_by: string           # agent or user
  url: string|null
  github_url: string|null
  stars: int|null

  # Evaluation
  status: enum
    # new | under_evaluation | adopted | deferred | rejected | deprecated
  ranking: int|null               # 1-5 (5 = critical value)
  evaluation_summary: string|null
  evaluation_date: datetime|null
  evaluated_by: string|null

  # Decision
  decision: enum|null
    # adopt | evaluate_further | defer | reject | deprecate
  decision_reason: string|null
  decision_by: string|null        # user or agent
  decision_date: datetime|null

  # Usage (if adopted)
  installed: boolean
  install_location: string|null   # where it's installed
  version_installed: string|null
  relevant_roles: [string]        # which roles use it (binds via role-topic-skill chain, not agent IDs)

  # Classification
  topics: [string]
  relevance_to: [string]          # which XIOPro components benefit
  model_tier: string|null         # haiku | sonnet | opus | any

  # Lifecycle
  last_reviewed_at: datetime|null
  review_cycle: enum|null         # monthly | quarterly | on_demand
  next_review_at: datetime|null

  # Links
  knowledge_vault_note: string|null  # path to Obsidian/vault note
  ticket_id: string|null            # if adoption created a ticket

  # Metadata
  comments: string|null
  tags: [string]
  created_at: datetime
  updated_at: datetime

Resource Lifecycle¶

flowchart LR
    New["new"] --> Evaluate["under_evaluation"]
    Evaluate --> Adopt["adopted"]
    Evaluate --> Defer["deferred"]
    Evaluate --> Reject["rejected"]
    Defer --> Evaluate
    Adopt --> Deprecate["deprecated"]

Resource Statistics (as of 2026-03-28)¶

Status	Count	Examples
Adopted	7	Socket.dev, Semgrep MCP, MCP Builder, GWS CLI, Stripe CLI, awesome-agentic-patterns, awesome-claude-skills lists
Under Evaluation	17	pydantic-ai, Firecrawl, OpenSpace, Supermemory, claude-deep-research, ccusage, claude-context...
Deferred	6	CrewAI, Jujutsu, CLI-Anything, NotebookLM Py, Competitive Ads, Lead Research
Rejected/Skipped	6	Phylum, Canopy, Rask Master AI, Agent Alchemy, Supabase CLI, Amplify

8.10.3 Research Center Process (Codified from Day 0 Experience)¶

The following process was proven during the Day 0 session and should be the canonical Research Center workflow:

research_center_process:

  1_source_management:
    - maintain Source Registry with ranked, scheduled sources
    - add new sources when discovered (user tips, agent finds, community)
    - retire sources that become stale or irrelevant
    - scan sources on schedule (daily/weekly/monthly per source ranking)

  2_discovery:
    - agent scans source for new/updated items
    - creates Resource Registry entries with status "new"
    - creates knowledge vault notes using standard template
    - reports discoveries to the orchestrator

  3_evaluation:
    - agent or user reviews each resource
    - assesses: what it does, relevance to XIOPro, maturity, cost, risk
    - updates Resource Registry with evaluation summary
    - categorizes: adopt / evaluate_further / defer / reject

  4_decision:
    - user makes final decision on high-impact resources
    - agent can auto-decide on clear skip/defer cases
    - decision recorded with reason in Resource Registry

  5_adoption:
    - adopted resources get installation ticket
    - installed on correct host (Hetzner/Mac)
    - added to skill registry if applicable
    - added to BP Part 2 (technology stack) if architectural

  6_review:
    - adopted resources reviewed on cycle (monthly/quarterly)
    - check: still maintained? still relevant? better alternatives?
    - deferred resources re-evaluated on schedule
    - deprecated resources removed and noted

Rule¶

The Research Center is not a one-time scan. It is a continuous intelligence operation that keeps XIOPro's technology stack current, discovers useful tools before we need them, and prevents the "we didn't know about X until it was too late" pattern.

8.11 External Research Source Integration¶

The Research Center should support governed use of external research sources.

Examples:

web research outputs
approved URLs / fetched references
provider documentation exports
Hugging Face research and repo discovery
benchmark/evaluation documents
local or remote CLI research tools

Hugging Face Note¶

For module and model scouting, the system may use Hugging Face as a governed research source for:

candidate model discovery
repository discovery
hosting clues
capability comparison
self-hosting research leads

However:

Hugging Face findings are inputs to evaluation, not automatic approvals
The module steward must still evaluate suitability
Rule steward / prompt steward / governor constraints still apply where relevant

8.12 Scheduled Research Tasks¶

Research should not depend only on manual ad hoc requests.

XIOPro should support scheduled or recurring research jobs such as:

recurring topic watch
competitor / market tracking
module portfolio refresh
Hugging Face model scouting
standards / regulation watch
literature or documentation refresh
periodic summary bundle generation
notebook refresh / insight digest

Typical triggers:

cron/scheduler policy
founder request
governance trigger
Dream / hindsight recommendation
module portfolio review cycle
project milestone

8.13 Research Task Definition¶

A research task should support at least:

research_task:
  id: string
  name: string
  topic_refs: [string]
  source_classes: [string]

  recurrence: string|null
  owner: string|null

  objective: string
  output_type: enum
  # digest | comparison | watchlist | synthesis | presentation | candidate_scan

  review_required: boolean
  destination_refs: [string]
  status: enum

This allows research work to become schedulable and auditable.

8.14 Research Output Types¶

The Research Center should be able to produce at least:

research digest
comparison matrix
source bundle
watchlist update
candidate module scan
founder briefing
NotebookLM-ready curated packet
Obsidian-ready linked note export
recommendation draft
knowledge-ingestion bundle

Each output should preserve:

source lineage
generation date
task objective
review status

8.15 Research Lifecycle¶

flowchart TD
    Discover --> CurateSources
    CurateSources --> RunResearchTask
    RunResearchTask --> Synthesize
    Synthesize --> Review
    Review --> PublishResearchOutput
    PublishResearchOutput --> IngestToLibrarian
    IngestToLibrarian --> RetrieveAndReuse

Notes¶

not every research result is automatically trusted
publication and ingestion are separate from raw collection
founder review may be required depending on scope/risk

8.16 Research Governance Rule¶

Research outputs may influence the system, but they must not silently redefine it.

If a research output proposes changes to:

rules
skills
activations
module portfolio
architecture
governance policy

it must route through the relevant governed approval path.

8.17 Success Criteria¶

The Research Center is successful when:

research is repeatable, not one-off
useful outputs are preserved and reusable
NotebookLM and Obsidian serve clear bounded roles
external sources are curated before trust is assigned
scheduled research reduces repeated manual effort
founder exploration strengthens, rather than fragments, system knowledge

9. Hindsight / Learning Engine¶

9.1 Role¶

System learning layer that converts execution history and research history into reusable lessons.

Hindsight is not a first-order governing authority.

It participates in governance by producing: - evidence - reflections - improvement proposals - research/task recommendations

Its outputs must flow through the governed paths handled by the governor, rule steward, prompt steward, and module steward roles where applicable.

9.2 Inputs¶

Inputs may include:

activities
results
evaluations
repeated failures
successful patterns
research outputs
module evaluation history
governance interventions

9.3 Output¶

reflection:
  issue: string
  root_cause: string
  improvement: string
  confidence: float

9.4 Flow¶

flowchart TD
    Activity --> Evaluation
    Evaluation --> Reflection
    ResearchOutput --> Reflection
    Reflection --> Improvement
    Improvement --> RulesUpdate
    Improvement --> SkillUpdate
    Improvement --> ResearchTaskProposal

9.5 Memory Engineering Principles — Hindsight Implementation Requirements¶

Hindsight must comply with the Memory Engineering Principles defined in Part 4, Section 4.8A. Specifically:

Principle 1 (Async Updates): Hindsight memory extraction must never block the agent's main execution. Extraction runs as a post-activity background process, not inline.
Principle 2 (Debounce Writes): Hindsight must batch memory extraction. Instead of extracting from every individual activity, wait until session end or a configurable threshold (30 seconds or N activities), then make one extraction call. This reduces LLM cost and produces higher-quality extractions.
Principle 3 (Confidence Threshold): Hindsight memories must carry a confidence score (0.0-1.0). Memories below 0.7 confidence are discarded. Total stored memories per agent are capped at 100; when the cap is reached, the lowest-confidence memories are trimmed.
Principle 4 (Token Budget): When Hindsight auto-injects memories into agent context, the injection must respect the 2000-token memory context budget. Trim lowest-confidence memories first until the budget fits.

9.6 Current Deployment Reference (v5.0 Addition)¶

Hindsight is currently deployed and running.

Current deployment details:

API endpoint: localhost:8888
Admin endpoint: localhost:9999
Backend: Vectorize.io Docker container
Memory footprint: 1.06 GB RAM
Status: Operational

The blueprint describes Hindsight's architectural role within XIOPro. The current deployment provides the foundation for the learning and reflection capabilities described in this part. T1P should build on the existing Hindsight deployment rather than replacing it.

10. Dream Engine Integration¶

10.1 Role¶

Idle-time learning and restructuring layer.

Within the knowledge domain, Dream may:

compress memory
extract patterns
identify recurring research gaps
suggest research task creation
suggest knowledge cleanup
suggest rule/skill/module review opportunities

10.2 Trigger¶

Typical triggers:

idle system
scheduled execution windows
low-load background periods
accumulated unresolved research backlog

10.3 Boundary¶

Dream may propose and prepare, but it does not bypass:

Librarian discipline
rule steward stewardship
module steward governance
approval requirements

Dream participates in governance as a proposal and optimization layer, not as a first-order governor.

It may generate signals, recommendations, and restructuring proposals, but enforcement and approval remain with the governing/steward roles.

Hindsight and Dream are second-order governance inputs, not first-order governing authorities.

11. Research Scheduling & Refresh Discipline¶

11.1 Purpose¶

Research must remain current without becoming noisy.

This section defines the discipline for recurring research refresh.

11.2 Refresh Classes¶

Useful recurring classes include:

daily digest
weekly watch
milestone refresh
quarterly portfolio scan
event-triggered refresh
founder-requested deep refresh

11.3 Anti-Patterns Prevented¶

This discipline prevents:

forgotten research threads
one-time research that is never revisited
fragmented external findings
stale module comparisons
untraceable NotebookLM/Obsidian artifacts
duplicated manual re-research

11.4 Final Rule¶

Research is a governed system capability.

It must be schedulable, reviewable, preservable, and reusable.

11A. T1P Operating Posture for Research, Hindsight & Dream¶

11A.1 Purpose¶

Research, Hindsight, and Dream remain core parts of XIOPro in T1P.

They are not removed.

But they must begin with an explicit operating posture so the blueprint preserves ambition without pretending all related subsystems are equally mature on day one.

11A.2 Research Center Posture¶

Posture: Active but Narrow

The Research Center must be real in T1P.

Minimum required T1P capabilities:

Research Task creation
scheduled or triggered research execution
source bundle association
Research Output preservation
lineage to sources
promotion into Librarian-managed knowledge or explicit draft retention

Not required for T1P:

a giant multi-surface research universe
many presentation/export paths all at once
rich fully-automated external research orchestration at scale

Rule¶

Research remains in scope for T1P, but the first research wave should prove the canonical research path rather than maximum breadth.

11A.3 Hindsight Posture¶

Posture: Proposal-Oriented

Hindsight must exist in T1P as a real learning and reflection engine.

Minimum required T1P capabilities:

generate reflections from execution history
generate improvement proposals
support research/task recommendations
preserve evidence and rationale for proposals

Hindsight must not in T1P:

silently publish protected changes
directly mutate live governance or behavior
bypass approval or stewardship paths

Rule¶

Hindsight is a second-order governance input, not a first-order governing authority.

11A.4 Dream Posture¶

Posture: Proposal-Oriented

Dream must remain in the architecture and in T1P, but with bounded authority.

Minimum required T1P capabilities:

idle-time pattern detection
cleanup/compression opportunities
proposal generation
recurring gap detection
research/rule/skill/module review suggestions

Dream must not in T1P:

directly rewrite protected live behavior
directly publish rule changes
directly alter module policy without governed approval
become a hidden autonomous control plane

Rule¶

Dream may prepare, compress, detect, and propose. It may not self-authorize protected operational change.

11A.5 Preservation Rule¶

XIOPro keeps:

Research Center
Hindsight
Dream

in T1P because they are part of the system's compounding-intelligence model.

The constraint is not removal.

The constraint is operating posture.

11A.6 First Proof Rule¶

Before these subsystems are expanded in breadth, T1P should prove at least:

one real Research Task path
one real Research Output preservation and promotion path
one real Hindsight reflection/proposal path
one real Dream proposal path

If these are not proven, further sophistication increases conceptual richness but not system credibility.

12. Indexing & Retrieval¶

Requirements¶

fast
topic-aware
semantic
low token usage

Strategy¶

DB indexing
vector layer (optional)
topic filtering
metadata filtering

12.1 Neo4j — Deprecated (v5.0 Update)¶

Neo4j — Deprecated¶

Neo4j was originally included (v3.2.1 blueprint) for: - Knowledge graph (BIM entity relationships, topic navigation) - Librarian routing service

Deprecated in v5.0 because: - Neither instance was actively used by any agent workflow - 1.83 GB RAM consumed for zero operational value - PostgreSQL with pgvector (v0.8.2) provides vector search within the primary database - PostgreSQL JSONB + recursive CTEs handle hierarchical relationships (topics, entities) - Full-text search covers document retrieval needs for T1P

Both Neo4j instances (graph_stack, librarian) have been stopped and removed from the server.

If graph traversal requirements emerge post-T1P (millions of interconnected BIM entities, complex multi-hop queries), Neo4j can be reintroduced. For T1P, PostgreSQL handles everything.

13. Knowledge Storage Strategy¶

DB¶

indexed entries
metadata
relationships

Git¶

source documents
rules
blueprints

Markdown¶

canonical human-readable form

14. Knowledge Lifecycle¶

flowchart TD
    Create --> Classify
    Classify --> Store
    Store --> Index
    Index --> Retrieve
    Retrieve --> Use
    Use --> Improve
    Improve --> Update

15. Anti-Entropy Rules¶

System must avoid:

duplicate documents
orphan knowledge
inconsistent naming
unindexed files
chat-only knowledge

16. Knowledge Cost Optimization¶

avoid full document loads
use summaries
index aggressively
retrieve selectively

17. Knowledge Success Criteria¶

Knowledge system is successful if:

information is findable instantly
agents reuse knowledge
system improves over time
duplication is minimized
documents remain structured

18. Current State (v5.0 Addition)¶

18.1 Hindsight¶

Status: Deployed and running
Endpoints: localhost:8888 (API), localhost:9999 (admin)
Backend: Vectorize.io Docker container
Memory: 1.06 GB RAM
Role: Learning and reflection engine for execution history

18.2 Neo4j (Deprecated)¶

Status: Removed -- both instances stopped and deleted
Previous footprint: 1.83 GB combined (graph_stack 1.2 GB, librarian 631 MB)
Replacement: PostgreSQL + pgvector for vector search, JSONB + recursive CTEs for hierarchical relationships, full-text search for document retrieval
Decision: Deprecated in v5.0. See Section 12.1 for full rationale.

18.3 Obsidian¶

Status: Not deployed (ticket 069 pending)
T1P dependency: None -- Librarian operates on Git-based markdown
Plan: Deploy when controlled-sync mirror pattern is ready

18.4 NotebookLM¶

Status: Not deployed
T1P dependency: None
Plan: Integrate as research presentation surface when research workflow is proven

19. Final Statement¶

Knowledge is the long-term memory of XIOPro .

If strong:

intelligence compounds
execution accelerates

If weak:

system forgets
cost explodes
chaos returns

Changelog¶

Version	Date	Author	Changes
4.1.0	2026-03-26	BM	Initial v4.1 release
4.2.0	2026-03-28	BM	Added: Hindsight current deployment reference (9.5). Added: Neo4j evaluation note for T1P (12.1). Added: Obsidian current status -- not deployed (8.8). Added: NotebookLM current status -- not deployed (8.7). Added: Current State section (18) with deployment status for Hindsight, Neo4j, Obsidian, NotebookLM. Fixed: "Rufio" renamed to "Ruflo" globally. Added: Changelog section. Updated version header to 4.2.0.
4.2.2	2026-03-28	000	Agent naming migration: R01 replaced with rule steward role. O01 replaced with 000 (governor role). M01 replaced with module steward role. BM/B1-B5/M0 replaced with 3-digit IDs in Obsidian vault structure and skill registry. Changelog author entries preserved as historical.
4.2.4	2026-03-28	000	Section 8.9: Added cross-reference to Part 4 Section 4.11 (Skill Selection Architecture). Added `category` and `min_model_tier` fields to skill registry YAML. Added category comment block aligned with Section 4.11 skill library.
4.2.5	2026-03-28	000	Founder clarifications: (1) Research Domains section (8.2A) -- Research Center serves ALL knowledge domains (BIM, AI/LLM, market/business), not just XIOPro tech. NotebookLM clarified as primary research acceleration surface. (2) Skill Performance Database (8.9A) -- track token consumption, quality, model compatibility per skill; feeds Dream Engine idle maintenance. (3) Resource Registry `used_by_agents` replaced with `relevant_roles` per role-topic-skill binding chain (Part 4 Section 4.11).
4.2.6	2026-03-28	000	Roles over numbers: Removed agent IDs from architectural descriptions, section headers, and diagrams. Role names used throughout instead of agent numbers.
4.2.7	2026-03-28	BM	Neo4j deprecated: Section 12.1 rewritten from evaluation note to deprecation notice with full rationale. Section 18.2 updated to "Removed" status. Vault file tree updated. Both instances stopped and deleted from server. PostgreSQL + pgvector replaces all Neo4j use cases for T1P.
4.2.8	2026-03-28	BM	AGI pattern gap fix: Added T1P RAG Pipeline section (7.18) — embedding model, chunking strategy, hybrid retrieval, reranking, generation contract. Addresses audit gap "RAG Pipeline Specifics" (Principle 9).
4.2.9	2026-03-28	000	Wave 1-2 BP fixes: Added Section 4.5 Librarian Document Decomposition Protocol — extraction targets, note format, vault structure, decomposition flow, and rules for breaking large documents into linked atomic knowledge notes. Renumbered 4.5 Future Extensions to 4.6, 4.6 Knowledge Ledger to 4.7 (all sub-sections renumbered accordingly).
4.2.10	2026-03-28	000	Memory engineering principles: Added Section 4.5A (Librarian memory engineering requirements) and Section 9.5 (Hindsight memory engineering requirements) referencing Part 4 Section 4.8A. Added confidence field to Knowledge Schema (Section 6.1) with threshold and cap rules. Renumbered Hindsight Section 9.5 Current Deployment to 9.6. Slimmed Section 7.4 Rule Steward Role to cross-reference Part 4 Section 4.2A (removed duplicated responsibilities). Content deduplication: slimmed duplicate content throughout to cross-reference primary locations.
5.0.1	2026-03-30	GO	N6: Added content_ref resolution note to Section 6.1 Knowledge Schema — format, vault root, resolution rule.