ValkyrAI Conversational Memory Architecture

ValkyrAI previously relied on the userPreferences table as a crude scratchpad for conversation history. The new Conversational Memory subsystem introduces a purpose-built, vectorised memory graph so LLM interactions can retrieve rich context and persist high-quality memory artifacts without polluting preference storage.

Design Goals

Vector-aware storage – every memory node stores an embedding vector and keyword fingerprint so follow-up prompts can be matched semantically.
Lossless archival – raw content is persisted (after compression) alongside metadata so we can always rehydrate the original exchange.
Deduplication – identical payloads within the same branch collapse into a single node with merged access statistics.
Branching & session isolation – chat sessions map to branches that can fork, allowing experimental conversations without contaminating primary memory.
Indexing for recall – lightweight keyword indexing accelerates hybrid retrieval (vector + lexical).
Composable service API – the rest of the LLM stack only speaks to ConversationMemoryService, which handles storage, retrieval, and summarised context assembly.

Data Model

conversation_branches
 ├─ id (UUID PK)
 ├─ principal_id (UUID)
 ├─ root_session_id (VARCHAR)
 ├─ label (VARCHAR)
 ├─ metadata (JSON)
 ├─ created_at / updated_at (TIMESTAMP)

conversation_memory_nodes
 ├─ id (UUID PK)
 ├─ branch_id (UUID FK → conversation_branches)
 ├─ chat_message_id (UUID, optional FK → chat_message)
 ├─ parent_node_id (UUID FK → conversation_memory_nodes)
 ├─ message_role (VARCHAR)  -- system|user|assistant|summary
 ├─ session_id (VARCHAR)
 ├─ sequence_no (BIGINT)    -- monotonically increasing insert order
 ├─ content_hash (CHAR(64)) -- SHA-256 of canonical text
 ├─ content_chars (INT)
 ├─ compressed_payload (MEDIUMBLOB) -- zlib compressed text
 ├─ embedding_vector (LONGBLOB)     -- float32 array
 ├─ embedding_dimensions (INT)
 ├─ embedding_checksum (CHAR(64))
 ├─ keywords_text (TEXT)    -- space separated keywords
 ├─ metadata (JSON)
 ├─ relevance_score (DOUBLE) -- rolling usefulness score
 ├─ archived (BOOLEAN)
 ├─ created_at / updated_at (TIMESTAMP)

Key indexes:

conversation_branches(principal_id, root_session_id)
conversation_memory_nodes(branch_id, sequence_no DESC)
conversation_memory_nodes(content_hash)
conversation_memory_nodes(session_id)
FULLTEXT(conversation_memory_nodes.keywords_text) (MySQL ≥ 8.0)

Service Layer

ConversationMemoryService exposes:

List<ChatMessage> buildContext(...) – hybrid retrieval that pulls the highest scoring nodes (vector + keyword) and emits system messages ready to be appended to an LLM request.
void recordInteraction(...) – records both user and assistant turns, performs dedupe, compression, embedding, and keyword extraction.
UUID forkBranch(...) – optional branch creation for experiments and A/B workflows.

Embeddings are produced by MemoryEmbeddingService. The default implementation (LocalEmbeddingService) uses a deterministic hashing trick to create 384‑dim float vectors without external dependencies; if an embedding provider is configured later the service can be swapped via Spring configuration.

Retrieval Strategy

Select recent candidates for the active branch (configurable window).
Perform lexical filtering when a query is provided (using FULLTEXT index).
Compute cosine similarity between candidate embeddings and the query embedding.
Blend similarity, recency, and explicit relevance_score into a final score.
Return the top N nodes (default 6) as bullet-point style system memories.

This hybrid approach keeps latency predictable (the vector math happens in JVM) while providing semantically relevant recalls.

Integration with LLMController

Prior to dispatching a chat completion request, ValkyrAI now injects context messages returned by ConversationMemoryService.buildContext(...).
After receiving the assistant reply, ConversationMemoryService.recordInteraction(...) persists both sides of the exchange to the memory graph.
Legacy UserPreference fallbacks remain in place for environments where the new subsystem is not yet migrated, but they are no longer consulted when the memory service is available.

Future Extensions

Swap in pgvector / Milvus when we need ANN scale.
Export/import branches for org-level knowledge sharing.
Periodic compression jobs that summarise older nodes back into the memory graph.

Design Goals​

Data Model​

Service Layer​

Retrieval Strategy​

Integration with LLMController​

Future Extensions​