ValkyrAI Conversational Memory Architecture
ValkyrAI previously relied on the userPreferences table as a crude scratchpad for
conversation history. The new Conversational Memory subsystem introduces a
purpose-built, vectorised memory graph so LLM interactions can retrieve rich
context and persist high-quality memory artifacts without polluting preference
storage.
Design Goals
- Vector-aware storage – every memory node stores an embedding vector and keyword fingerprint so follow-up prompts can be matched semantically.
- Lossless archival – raw content is persisted (after compression) alongside metadata so we can always rehydrate the original exchange.
- Deduplication – identical payloads within the same branch collapse into a single node with merged access statistics.
- Branching & session isolation – chat sessions map to branches that can fork, allowing experimental conversations without contaminating primary memory.
- Indexing for recall – lightweight keyword indexing accelerates hybrid retrieval (vector + lexical).
- Composable service API – the rest of the LLM stack only speaks to
ConversationMemoryService, which handles storage, retrieval, and summarised context assembly.
Data Model
conversation_branches
├─ id (UUID PK)
├─ principal_id (UUID)
├─ root_session_id (VARCHAR)
├─ label (VARCHAR)
├─ metadata (JSON)
├─ created_at / updated_at (TIMESTAMP)
conversation_memory_nodes
├─ id (UUID PK)
├─ branch_id (UUID FK → conversation_branches)
├─ chat_message_id (UUID, optional FK → chat_message)
├─ parent_node_id (UUID FK → conversation_memory_nodes)
├─ message_role (VARCHAR) -- system|user|assistant|summary
├─ session_id (VARCHAR)
├─ sequence_no (BIGINT) -- monotonically increasing insert order
├─ content_hash (CHAR(64)) -- SHA-256 of canonical text
├─ content_chars (INT)
├─ compressed_payload (MEDIUMBLOB) -- zlib compressed text
├─ embedding_vector (LONGBLOB) -- float32 array
├─ embedding_dimensions (INT)
├─ embedding_checksum (CHAR(64))
├─ keywords_text (TEXT) -- space separated keywords
├─ metadata (JSON)
├─ relevance_score (DOUBLE) -- rolling usefulness score
├─ archived (BOOLEAN)
├─ created_at / updated_at (TIMESTAMP)
Key indexes:
conversation_branches(principal_id, root_session_id)conversation_memory_nodes(branch_id, sequence_no DESC)conversation_memory_nodes(content_hash)conversation_memory_nodes(session_id)FULLTEXT(conversation_memory_nodes.keywords_text)(MySQL ≥ 8.0)
Service Layer
ConversationMemoryService exposes:
List<ChatMessage> buildContext(...)– hybrid retrieval that pulls the highest scoring nodes (vector + keyword) and emits system messages ready to be appended to an LLM request.void recordInteraction(...)– records both user and assistant turns, performs dedupe, compression, embedding, and keyword extraction.UUID forkBranch(...)– optional branch creation for experiments and A/B workflows.
Embeddings are produced by MemoryEmbeddingService. The default implementation
(LocalEmbeddingService) uses a deterministic hashing trick to create 384‑dim
float vectors without external dependencies; if an embedding provider is
configured later the service can be swapped via Spring configuration.
Retrieval Strategy
- Select recent candidates for the active branch (configurable window).
- Perform lexical filtering when a query is provided (using
FULLTEXTindex). - Compute cosine similarity between candidate embeddings and the query embedding.
- Blend similarity, recency, and explicit
relevance_scoreinto a final score. - Return the top N nodes (default 6) as bullet-point style system memories.
This hybrid approach keeps latency predictable (the vector math happens in JVM) while providing semantically relevant recalls.
Integration with LLMController
- Prior to dispatching a chat completion request, ValkyrAI now injects context
messages returned by
ConversationMemoryService.buildContext(...). - After receiving the assistant reply,
ConversationMemoryService.recordInteraction(...)persists both sides of the exchange to the memory graph. - Legacy
UserPreferencefallbacks remain in place for environments where the new subsystem is not yet migrated, but they are no longer consulted when the memory service is available.
Future Extensions
- Swap in pgvector / Milvus when we need ANN scale.
- Export/import branches for org-level knowledge sharing.
- Periodic compression jobs that summarise older nodes back into the memory graph.