Hypernym · Substrate Move 2026 · 05 · 07 Hypercore + Modulum

The single move that locks the moat: serialize Hypercore's output as a Persistent Domain Schema and make Modulum a native query engine over it.

The thesis

Asked independently — "Given the Hypernym product surface, what's the single highest-ROI compose that unlocks the largest set of follow-on products?" — Grok and Gemini converged. Different language, same primitive: take Hypercore's output (the grounded, provenance-preserving fact graph) and make it a portable specification — a compiled domain. Re-architect Modulum's inference engine to natively load and query that specification at near-zero latency, treating it as a first-class component of attention itself. Not a feature. A new substrate. The Persistent Domain Schema is the memory OS for transformers.

The single highest-ROI compose

A standardized, portable, machine-readable specification for the output of the Hypercore Engine. Modulum re-architected to natively load and query this PDS at near-zero latency, treating it as a first-class component of attention. Both Grok and Gemini independently identified this composition.

The Move · Persistent Domain Schema (PDS)

Compile a domain once. Query it at clock-cycle speed forever.

The PDS is not just data — it is a compiled domain: a serialized graph of entities, verified facts, confidence scores, source provenance, and reranked embeddings. Hypercore produces it (intake → workflows → agent → confidence → consistency → stream). Modulum consumes it natively (attention queries route through the PDS block; vocabulary output restricts to grounded terms; KV cache recycles around its facts). Same primitive — Gemini calls it a Persistent Domain Schema (PDS); Grok calls it the Persistent Fact Inference Stack. Both unlock the same follow-on product surface.

Convergent · Grok + Gemini

Follow-on · 01

Domain-as-a-Service

Hypernym sells pre-built, high-value PDSs as standalone SKUs. "PDS: FDA Clinical Trials", "PDS: US Case Law", "PDS: Materials Patents 1995–2025". Customers don't build the corpus — they license a compiled domain.

Follow-on · 02

Locus-in-a-Box

A Modulum-powered server that loads any PDS. Self-hosted in customer VPC. Replaces RAG infrastructure entirely — domain expertise as a query surface, not a retrieval pipeline.

Follow-on · 03

The Anneal Marketplace

Model providers don't just sell models — they sell models pre-loaded with a certified PDS. Instant domain expertise. New revenue line for Mistral, Cohere, hyperscalers; new sales channel for Hypernym.

Follow-on · 04

Living Documents · the World Model foundation

An agent that mutates a PDS based on new evidence. Self-updating knowledge base. Continuously improves the model's downstream performance without retraining. This is the foundation Hypernym needs to ship vertical world models per the partner deck (page 15).

The Persistent Domain Schema · spec sketch

What a PDS actually is, in concrete terms. The schema serializes everything Hypercore produces and shapes it for Modulum's inference-time consumption.

PDS · what it carries

Six fields, every entry · all four pillars preserved

entity — resolved canonical entity from Hypercore intake (e.g., EN:gene:COL3A1, EN:case:Smith_v_Jones_2019, EN:reservoir:Bakken_Three_Forks).
facts[] — verified claims about the entity, each carrying its source database, query, agent turn, and validation against the actual result.
confidence — mechanical score from Hypercore: source_type × grounding × corroboration, 0.0–1.0, math visible.
provenance — chain of custody from query to claim. Citation links validated against PubMed / SEC / case law / the corpus.
embedding — fact-based reranking vector from HyperRemember. Cleaner than RAG chunks, drift-resistant.
vocab_window — domain-specific output token mask for Modulum's vocabulary restriction. Eliminates out-of-domain hallucination at inference.

PDS · what it enables in Modulum

Three new primitives at inference time

PDS-aware attention — when the transformer attends, instead of full dot-product over a dense KV cache, it issues a query to the PDS block. Returns only the ~25% signal facts (the "75% noise" finding applied to memory, not just attention).
Persistent expertise across sessions — load a PDS once, the model recalls it forever. Process restarts don't wipe state. The cache intelligently recycles around the PDS.
Vocabulary output restriction — the model's output is constrained to tokens consistent with the PDS's vocab window. No domain hallucination. No catastrophic forgetting.

How Hypercore + Modulum compose around the PDS

The architecture that the PDS enables. Each component already exists in the Hypernym product surface — the new primitive is the schema between them.

PDS · the compile + query architecture Hypercore (compile) Modulum (query) PDS (substrate)

flowchart LR
  subgraph HC ["HYPERCORE · compile the domain"]
    direction TB
    HC1["Intake
YAML config · 21+ DBs"]
    HC2["Workflows
deterministic pre-agent"]
    HC3["Agent
writes its own SQL"]
    HC4["Confidence
source × grounding × corrob"]
    HC5["Consistency
flag references not queried"]
    HC6["Stream
delta-update facts"]
    HC1 --> HC2 --> HC3 --> HC4 --> HC5 --> HC6
  end

  subgraph PDS ["PERSISTENT DOMAIN SCHEMA · the compiled domain"]
    direction TB
    P1["entity · facts · confidence
provenance · embedding · vocab_window"]
  end

  subgraph MOD ["MODULUM · query at clock-cycle speed"]
    direction TB
    M1["PDS-aware attention
~25% signal · 75% noise pruned"]
    M2["Persistent expertise
survives restarts · fixed memory"]
    M3["Vocab output restriction
no domain hallucination"]
    M4["3.04× decode speedup
scales with context"]
    M1 --> M2 --> M3 --> M4
  end

  HC6 -->|"serialize"| P1
  P1 -->|"native load"| M1

  classDef hc fill:#e8f0d8,stroke:#5e7a18,color:#0f1e30,font-weight:600;
  classDef mo fill:#fdeae3,stroke:#b85925,color:#0f1e30,font-weight:600;
  classDef pds fill:#ece1ea,stroke:#6e2966,color:#0f1e30,font-weight:600;
  class HC1,HC2,HC3,HC4,HC5,HC6 hc;
  class M1,M2,M3,M4 mo;
  class P1 pds;

The Modulum chip · Attention as a Database Query

Both Grok and Gemini converged on the same chip implication: software-only Modulum prunes the 75% noise; the chip never computes or stores it in the first place. The PDS becomes hardware.

Modulum Substrate · the inference card

Hardware-level query against an on-chip PDS

A custom inference card with dedicated silicon for Modulum's 7 components, directly wired to on-chip memory holding the Persistent Domain Schema. When the transformer needs to attend, it issues a hardware-level query to the PDS block. The block returns only the handful of relevant facts — the ~25% signal — which the attention heads use. The full dot-product attention over a dense KV cache is replaced by what Gemini called "Attention as a Database Query": a single clock-cycle-level operation that fuses inference and RAG.

What software-only cannot

Eliminate the CPU/GPU context-switching cost between inference and RAG lookups. Software must round-trip; the Substrate fuses them into one operation.

What it enables

Real-time grounded world models for robotics · autonomous vehicles · AR overlays — applications where millisecond-level latency for complex fact-based reasoning is a hard requirement.

Buyer · path

Strategic chip co-design partners. Multi-year roadmap. In development today (per partner deck page 13). Consumer device makers · phone OEMs · robotics platforms · AR.

Why this is zero-to-one

No general-purpose GPU can do this. The chip's value is not throughput — it's the elimination of the inference/retrieval boundary. The PDS becomes the addressable memory of the model itself.

The joint flagship · Sentient / EternalAgent

When Hypercore + Modulum compose around the PDS, Hypernym ships an outcome subscription, not a tool. Both Grok and Gemini converged on this product. The moat is the accumulated state.

Hypernym Sentient · the outcome subscription

The vertical agent that lives in your VPC and gets smarter every week

A fully managed, long-running, vertical-specific AI agent. Sold as a solution, not a tool. Customers subscribe to "Sentient Clinical Researcher" or "Sentient Underwriting Analyst" — not an API. Hypernym manages the entire stack: Hypercore continuously ingests new domain data, updating a live PDS; the Modulum-powered agent uses persistent memory to perform its function over weeks and months, becoming progressively more expert without retraining.

Who buys

C-suite or Head of Business Unit. They're buying an outcome ("accelerate drug discovery by 30%"), not an engineering project. Significant annual subscription — the value of a tireless, domain-expert digital employee.

The moat

The state. After one year, the Sentient agent's PDS — refined by continuous data ingestion and interaction — represents a unique, auditable, irreplaceable corporate asset. Switching cost is not migrating technology; it's abandoning a year's worth of accumulated, structured corporate memory.

Why this locks both platforms

Sentient cannot exist without Hypercore (continuous PDS production) AND Modulum (persistent inference, infinite context, vocab restriction). One platform alone gives you a chatbot or a fast model; together they give you a digital employee with audit-grade memory.

Pilot pattern

Sample (1–2 weeks) → Paid pilot (fixed-fee, scoped engagement, full domain stood up) → VPC deployment (containerized, runs in your cloud, your data never leaves). Per partner deck page 15 — three steps, low commitment to start.

Outliers preserved · Pivot mode

Per Pivot mode (FORGE.md §12 — diverge, surface outliers), single-model proposals are kept. The deepest creative moves often arrive without convergence on round one. Two outliers from the Hypernym-only round.

Gemini · outlier · Modulum Rosetta

The model MRI · diagnostic-as-a-service

An analysis service that ingests a customer's proprietary foundation model and a target corpus, then produces a "Structural Efficiency Map" — exactly which attention heads and layers are redundant or contribute only noise for that specific domain. Provides a surgical pruning/distillation strategy. Sells insight, not implementation.

Why zero-to-one

Productizes the "75% of attention is noise · 4 companies, 1 algebra" discovery as a unique diagnostic capability. Others sell black-box acceleration; Rosetta provides architects with a fundamental understanding of their own model's architecture.

Buyer · wedge

AI research labs at hyperscalers · sovereign wealth funds · large enterprises training their own foundation models (Bloomberg, Apple). Wedge: fixed-scope one-time engagement to analyze one model, deliver one report.

Falsify (2 weeks)

Customer uses report's recommendations to create a pruned model that is ≥30% smaller and faster while retaining ≥99% of performance on key benchmarks.

Grok · outlier · EchoCore

Streaming corpus → drift-free memory

Streams live corpus updates (real-time sensor data, market feeds, news, IoT) into Hypercore workflows for incremental entity resolution. Anneals Modulum inference to clean and persist evolving expertise without full re-inference. Outputs delta-updated fact graphs with confidence deltas tracked per stream event.

Why zero-to-one

Static transformers + RAG lose historical context on every update. EchoCore turns time-series domains into adaptive memory systems — drift-free, with deltas tracked back to the stream event.

Buyer · wedge

Energy (grid operators monitoring dynamic loads) · Finance (real-time risk, intraday positioning). Stream API pilot on synthetic grid data (1K events/hour), delivering updated graphs in <5min latency.

Falsify (4 weeks)

Simulate 1-week stream on energy corpus — verify <5% drift in recall accuracy vs baseline 25% loss with periodic re-inference.

Note on the Karpathy auto-research loop

Question that came up: do the R6 research-track upgrades modify the Karpathy-style autoresearch loop pattern? Short answer: no. The inner loop stays unchanged; the panel layer fires around it.

Auto-research loop · structural change summary

The Karpathy pattern stays. The panel adds an outer layer.

The current Forge research loop is a Karpathy-style autoresearch pattern: a fast local model (qwen3:8b) proposes parameter changes as structured JSON; a cloud executor (Codex / Gemini) reviews every Nth iteration for deeper reasoning; the local model iterates. This is the right pattern for parameter exploration over thousands of trials at low cost. It doesn't change.

What R6 adds is a parallel-panel layer at the outer FSM gates — strategy review, verdict, audit. Today's research FSM advances on Grok + 1 cross-model (count:1) for strategy, Grok-only for verdict. R6 changes that to panel_quorum + convergence_attested + outlier_preserved at the gates. The 600+ LOC of convergence machinery already in forge-core/src/convergence/ is reused — only the wiring is new (~250 LOC net new + ~80 LOC swap + ~30 LOC YAML, per the R6 plan filed in 09-r6-upgrade-plan.md).

Stays unchanged · Karpathy inner loop

The fast/deep alternation

Local fast model proposes parameter changes as structured JSON
Cloud executor reviews every Nth iteration for depth
Local iterates; ledger records iteration / model / score / metrics
Plateau detection, fingerprint-based stuck-model detection, MLX/Ollama fallback chain
research/harness/src/loop.ts needs zero changes per the R6 plan

Changes · outer panel layer (R6)

What gets new wiring

research.yaml — new CONVERGENCE_AUDIT state between STRATEGY_REVIEW and EXPERIMENT_RUNNING
swarma.ts reviewStrategy() — sequential reviewer loop replaced with single panel.execute() call (behind SWARMA_PANEL_V2 flag in Sprint A)
New research/harness/src/panel.ts (~150 LOC) — execute(), routeByMode(), preserveOutliers(), steerByCategory()
4 new engine guards: panel_quorum, convergence_attested (with scope param for sprint vs research), outlier_preserved, categorical_review_complete
Manifest schema bump: roles.review becomes string[] | PanelConfigPartial (Sprint B)

The single answer

"What's the zero-to-one product breakthrough Hypernym ships by composing Hypercore + Modulum?"

Hypernym · binding answer

Build the Persistent Domain Schema. Sell the Sentient subscription. Tape out the Substrate chip.

Three independent product analyses (Grok adversarial · Gemini synthesizer · the Hypercore + Modulum partner deck) converge on the same architecture: compile a domain into a portable schema (PDS) using Hypercore; query it natively at clock-cycle speed using Modulum; sell the result as a managed agent that lives in the customer's VPC and gets smarter every week (Sentient). The PDS is the load-bearing primitive — it unlocks Domain-as-a-Service, Locus-in-a-Box, the Anneal Marketplace, and Living Documents (the foundation for vertical world models). The chip is the multi-year hardware moat that turns the substrate into Attention as a Database Query. Outliers preserved: Modulum Rosetta (the model MRI as a diagnostic SKU) and EchoCore (streaming corpus → drift-free memory for time-series verticals). Per partner deck page 13, the chip is in development today; the PDS is pure software composition of components Hypernym already ships.