Skip to main content

ValkyrAI Conversational Memory Architecture

ValkyrAI previously relied on the userPreferences table as a crude scratchpad for conversation history. The new Conversational Memory subsystem introduces a purpose-built, vectorised memory graph so LLM interactions can retrieve rich context and persist high-quality memory artifacts without polluting preference storage.

Design Goals

  • Vector-aware storage – every memory node stores an embedding vector and keyword fingerprint so follow-up prompts can be matched semantically.
  • Lossless archival – raw content is persisted (after compression) alongside metadata so we can always rehydrate the original exchange.
  • Deduplication – identical payloads within the same branch collapse into a single node with merged access statistics.
  • Branching & session isolation – chat sessions map to branches that can fork, allowing experimental conversations without contaminating primary memory.
  • Indexing for recall – lightweight keyword indexing accelerates hybrid retrieval (vector + lexical).
  • Composable service API – the rest of the LLM stack only speaks to ConversationMemoryService, which handles storage, retrieval, and summarised context assembly.

Data Model

conversation_branches
├─ id (UUID PK)
├─ principal_id (UUID)
├─ root_session_id (VARCHAR)
├─ label (VARCHAR)
├─ metadata (JSON)
├─ created_at / updated_at (TIMESTAMP)

conversation_memory_nodes
├─ id (UUID PK)
├─ branch_id (UUID FK → conversation_branches)
├─ chat_message_id (UUID, optional FK → chat_message)
├─ parent_node_id (UUID FK → conversation_memory_nodes)
├─ message_role (VARCHAR) -- system|user|assistant|summary
├─ session_id (VARCHAR)
├─ sequence_no (BIGINT) -- monotonically increasing insert order
├─ content_hash (CHAR(64)) -- SHA-256 of canonical text
├─ content_chars (INT)
├─ compressed_payload (MEDIUMBLOB) -- zlib compressed text
├─ embedding_vector (LONGBLOB) -- float32 array
├─ embedding_dimensions (INT)
├─ embedding_checksum (CHAR(64))
├─ keywords_text (TEXT) -- space separated keywords
├─ metadata (JSON)
├─ relevance_score (DOUBLE) -- rolling usefulness score
├─ archived (BOOLEAN)
├─ created_at / updated_at (TIMESTAMP)

Key indexes:

  • conversation_branches(principal_id, root_session_id)
  • conversation_memory_nodes(branch_id, sequence_no DESC)
  • conversation_memory_nodes(content_hash)
  • conversation_memory_nodes(session_id)
  • FULLTEXT(conversation_memory_nodes.keywords_text) (MySQL ≥ 8.0)

Service Layer

ConversationMemoryService exposes:

  • List<ChatMessage> buildContext(...) – hybrid retrieval that pulls the highest scoring nodes (vector + keyword) and emits system messages ready to be appended to an LLM request.
  • void recordInteraction(...) – records both user and assistant turns, performs dedupe, compression, embedding, and keyword extraction.
  • UUID forkBranch(...) – optional branch creation for experiments and A/B workflows.

Embeddings are produced by MemoryEmbeddingService. The default implementation (LocalEmbeddingService) uses a deterministic hashing trick to create 384‑dim float vectors without external dependencies; if an embedding provider is configured later the service can be swapped via Spring configuration.

Retrieval Strategy

  1. Select recent candidates for the active branch (configurable window).
  2. Perform lexical filtering when a query is provided (using FULLTEXT index).
  3. Compute cosine similarity between candidate embeddings and the query embedding.
  4. Blend similarity, recency, and explicit relevance_score into a final score.
  5. Return the top N nodes (default 6) as bullet-point style system memories.

This hybrid approach keeps latency predictable (the vector math happens in JVM) while providing semantically relevant recalls.

Integration with LLMController

  • Prior to dispatching a chat completion request, ValkyrAI now injects context messages returned by ConversationMemoryService.buildContext(...).
  • After receiving the assistant reply, ConversationMemoryService.recordInteraction(...) persists both sides of the exchange to the memory graph.
  • Legacy UserPreference fallbacks remain in place for environments where the new subsystem is not yet migrated, but they are no longer consulted when the memory service is available.

Future Extensions

  • Swap in pgvector / Milvus when we need ANN scale.
  • Export/import branches for org-level knowledge sharing.
  • Periodic compression jobs that summarise older nodes back into the memory graph.