Memory System
Our system, our rules.
Design Principles
Section titled “Design Principles”- Every word persists — full transcripts of every conversation, every tool call, every response. Never deleted.
- Smart retrieval, not smart storage — store everything flat, use scoring to surface what matters.
- Local-first processing — Rivet Local (GERTY) handles embeddings and compaction. No cloud API dependency for memory.
- Time-aware — recent context matters more than old context. Ebbinghaus decay + access frequency.
- Two memory layers — short-term (session injection) and long-term (searchable archive).
Architecture
Section titled “Architecture”┌─────────────────────────────────────────────────────────┐│ Agent Turn ││ ││ System Prompt = workspace files ││ + short-term memory (auto-injected) ││ + relevant context (query-driven) │├─────────────────────────────────────────────────────────┤│ Short-Term Memory ││ ││ What: Last N messages + recent summaries ││ How: Loaded on session create, updated each turn ││ Scoring: recency-weighted, capped by token budget ││ Source: messages table + summaries table │├─────────────────────────────────────────────────────────┤│ Long-Term Memory ││ ││ What: Full transcript archive + summary DAG ││ How: Agent tools (memory_search, memory_browse) ││ Scoring: FTS + semantic + temporal decay ││ Source: messages + summaries + embeddings │├─────────────────────────────────────────────────────────┤│ Background Processing ││ ││ Embedder: Rivet Local generates embeddings (async) ││ Compactor: Rivet Local summarizes old messages (async) ││ Both run on timers, never block the message pipeline │└─────────────────────────────────────────────────────────┘Schema (ros_* prefix)
Section titled “Schema (ros_* prefix)”messages
Section titled “messages”The immutable transcript. Every message ever sent or received.
CREATE TABLE ros_messages ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), conversation_id UUID NOT NULL, agent TEXT NOT NULL, channel TEXT NOT NULL, role TEXT NOT NULL, content TEXT NOT NULL DEFAULT '', tool_name TEXT, tool_args JSONB, tool_result TEXT, metadata JSONB DEFAULT '{}', embedding halfvec(4000), content_tsv tsvector GENERATED ALWAYS AS (to_tsvector('english', content)) STORED, created_at TIMESTAMPTZ NOT NULL DEFAULT NOW());conversations
Section titled “conversations”Group messages into sessions.
CREATE TABLE ros_conversations ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), session_key TEXT NOT NULL, agent TEXT NOT NULL, channel TEXT NOT NULL, channel_id TEXT, bot_identity TEXT, title TEXT, settings JSONB DEFAULT '{}', active BOOLEAN DEFAULT true, created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW());summaries
Section titled “summaries”Compacted summaries of message groups. Forms a DAG for drill-down.
CREATE TABLE ros_summaries ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), conversation_id UUID, parent_id UUID REFERENCES ros_summaries(id), depth INTEGER NOT NULL DEFAULT 0, content TEXT NOT NULL, kind TEXT NOT NULL DEFAULT 'leaf', message_count INTEGER NOT NULL DEFAULT 0, earliest_at TIMESTAMPTZ, latest_at TIMESTAMPTZ, embedding halfvec(4000), content_tsv tsvector GENERATED ALWAYS AS (to_tsvector('english', content)) STORED, model TEXT, created_at TIMESTAMPTZ NOT NULL DEFAULT NOW());summary_sources
Section titled “summary_sources”Links summaries to their source messages.
CREATE TABLE ros_summary_sources ( summary_id UUID NOT NULL REFERENCES ros_summaries(id), message_id UUID NOT NULL REFERENCES ros_messages(id), ordinal INTEGER NOT NULL DEFAULT 0, PRIMARY KEY (summary_id, message_id));Short-Term Memory (Session Injection)
Section titled “Short-Term Memory (Session Injection)”What gets injected into the system prompt each turn:
Section titled “What gets injected into the system prompt each turn:”-
Workspace files — SOUL.md, IDENTITY.md, USER.md, AGENTS.md, TOOLS.md, MEMORY.md, today’s daily notes
-
Recent conversation — last N messages from this session (via session history)
-
Relevant context — hybrid-scored retrieval:
relevance = (fts_rank × 0.3) + (semantic_similarity × 0.3) + (temporal_score × 0.3) + (importance × 0.1)
where: fts_rank = BM25 full-text match (0-1) semantic_sim = cosine similarity of embedding to query (0-1) temporal_score = e^(-0.05 × days_since_access) × (1 + 0.02 × access_count) importance = base importance by type (correction: 0.9, preference: 0.8, fact: 0.6, task: 0.5)Token budget: ~4000 tokens for injected context. Fill with highest-scoring results until budget is reached.
Access frequency tracking:
Section titled “Access frequency tracking:”When a message or summary is returned in a search result, increment its access count. Frequently-accessed memories decay slower (Ebbinghaus reinforcement).
Long-Term Memory (Agent Tools)
Section titled “Long-Term Memory (Agent Tools)”Consolidated Tool Surface (3 tools)
Section titled “Consolidated Tool Surface (3 tools)”| Tool | Description |
|---|---|
memory_search | Unified search + auto-expand. Searches messages + summaries, auto-expands top summary hits to children/source messages. Supports FTS/trigram/regex modes, agent/date filters, optional LLM synthesis. |
memory_browse | Chronological message browsing. For reviewing sessions and catching up on activity. |
memory_stats | System health diagnostics. Embedding queue depth, unsummarized message counts, compaction status, summary tree depth, embedding coverage. |
Consolidated from the original 6-tool design (memory_grep, memory_expand, memory_describe, memory_expand_query) down to 3 tools that require less LLM orchestration.
Background Processing
Section titled “Background Processing”Embedder
Section titled “Embedder”- Runs on a timer (configurable interval)
- Picks up messages with NULL embedding
- Calls embedding model on GERTY (Nemotron 8B)
- Batch processing with error recovery
Compactor
Section titled “Compactor”Periodically summarize old messages into the summary DAG:
- Trigger: Check for conversations with unsummarized messages exceeding threshold
- Batch: Take the oldest unsummarized messages from that conversation
- Summarize: Send to Rivet Local — preserve key decisions, technical details, action items, state changes
- Store: Insert summary with parent_id linking to the conversation’s latest summary
- Link: Insert summary_sources rows connecting the summary to its source messages
- Embed: Queue the summary for embedding
Compaction levels:
- Level 0 (leaf): messages → 1 summary
- Level 1 (branch): leaf summaries → 1 branch summary
- Level 2 (root): branch summaries → 1 root summary
This creates a tree: root → branches → leaves → source messages. The memory_search tool auto-expands this tree.
v5 Memory-Quality Pipeline
Section titled “v5 Memory-Quality Pipeline”The v5 pipeline (April 2026) replaces the original cloud-model-tuned compactor with a local-first, thinking-model architecture optimized for faithfulness and searchability.
What v5 changed vs v4
Section titled “What v5 changed vs v4”| Concern | v4 behavior | v5 behavior |
|---|---|---|
| Source message truncation | Hard-capped at 1,000 chars per message | No truncation — 128k context window handles full messages |
| Summary budget | 1k / 1.5k / 2k tokens (leaf/branch/root) | 7k / 14k / 20k — thinking mode needs real headroom |
| Thinking | Disabled | Enabled — model reasons before summarizing |
| Timestamps | Date-only at branch/root, absent at leaf | ISO-minute timestamps on every message and every layer — recency discrimination across same-day iterations |
| Agent attribution | Dropped | Preserved per message ([#01 2026-04-18T12:00Z opus/user]) |
| Conversation metadata | Absent | Preamble header with conv id, agent, channel, title, span, message count |
| System messages | Treated as redundant context | First-class — extracts PR numbers, commits, skill names, line counts |
| Tool-call rows (empty content) | Ignored — never embedded, never summarized | Synthesized content via async queue (see below) |
| HTTP client | Raw fetch, 60s timeout | Hardened undici Agent — no timeouts except AbortSignal, 3 retries with 5/10/15s backoff |
Prompt architecture
Section titled “Prompt architecture”Three system prompts live in plugins/memory/postgres/src/compactor/types.ts:
LEAF_SYSTEM_PROMPT— summarize raw messagesBRANCH_SYSTEM_PROMPT— summarize leavesROOT_SYSTEM_PROMPT— summarize branches
All three share a common rule set: exhaustiveness (cover every distinct topic), no outside context (never invent facts not in the source), system-messages-first-class (extract identifiers verbatim), and a LaTeX ban (plain Unicode only).
Tool-call content synthesis
Section titled “Tool-call content synthesis”Many assistant messages in the corpus contain only tool_name + tool_args with empty content. These rows cannot be embedded (empty text) and fall through every search path.
The v5 pipeline synthesizes natural-language content for them — a single past-tense sentence describing what was called — which makes them findable by both FTS and vector search.
Two synthesis paths:
-
Async (live path) —
adapter.tscallsgraphile_worker.add_job('synthesize-tool-call', …)on insert when content is empty andtool_nameis set. The compaction-worker service consumes the job and writes content back. Non-blocking — inserts never fail on synthesis errors. Dedup viajob_keyso duplicate enqueues coalesce. -
CLI backfill (historical) —
rivetos memory backfill-tool-synthenqueues asynthesize-tool-calljob for each historical empty row. Idempotent — already-enqueued messages dedupe viajob_key. Concurrency, retries, and rate limiting are handled by graphile-worker on the compaction-worker side.
The shared helper (synthesizeToolCallContent in plugins/memory/postgres/src/tool-synth.ts) uses a hardened undici client and the same prompt as the compactor. Model-agnostic — point TOOL_SYNTH_ENDPOINT / TOOL_SYNTH_MODEL at any OpenAI-compatible endpoint.
Operations
Section titled “Operations”# Show graphile-worker queue state for all RivetOS tasksrivetos memory queue-status
# Enqueue all historical empty-content tool-call rowsrivetos memory backfill-tool-synth
# Plan only (count candidates, no enqueue)rivetos memory backfill-tool-synth --dry-runFailed jobs (after max_attempts=3) remain in graphile-worker’s _private_jobs table with attempts=max_attempts. Use queue-status to surface them; operators can re-enqueue manually if needed.
What We’re NOT Building
Section titled “What We’re NOT Building”- No vector database (pgvector in PostgreSQL is sufficient)
- No external embedding API (Nemotron on GERTY is local and free)
- No real-time streaming of memory updates
- No memory sharing between users (single-user system)
- No automatic forgetting/deletion (everything persists, scoring handles relevance)