Docs / Raw

Unified Conversation Memory

Sourced from docs/voice/unified-memory.md

Edit on GitHub

Unified Conversation Memory

Voice Mode v4.1 introduces unified conversation memory that maintains context across voice and text interactions, enabling seamless mode switching.

Overview

The unified memory system provides:

  • Cross-modal context: Conversation history shared between voice and text
  • Language switching events: Tracks when users switch languages
  • Mode transition handling: Preserves context when switching voice ↔ text
  • Session persistence: Maintains memory across browser refreshes
  • Privacy controls: User-controlled memory retention
┌─────────────────────────────────────────────────────────────────┐
│                    Unified Memory Store                          │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ┌──────────────┐                      ┌──────────────┐         │
│  │  Voice Mode  │◄────── Shared ──────►│  Text Mode   │         │
│  │              │        Memory        │              │         │
│  └──────────────┘                      └──────────────┘         │
│         │                                     │                  │
│         ▼                                     ▼                  │
│  ┌──────────────────────────────────────────────────┐           │
│  │             Conversation Context                  │           │
│  ├──────────────────────────────────────────────────┤           │
│  │ • Message history (last 50 messages)             │           │
│  │ • Language preferences & switches                │           │
│  │ • RAG context (retrieved passages)               │           │
│  │ • User preferences                               │           │
│  │ • Session metadata                               │           │
│  └──────────────────────────────────────────────────┘           │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Thinker-Talker Pipeline Integration

sequenceDiagram participant User participant Frontend participant Memory as Unified Memory participant Thinker participant RAG participant Talker User->>Frontend: Voice or Text input Frontend->>Memory: add_entry(role="user", mode, content) Note over Memory: Store with mode tag<br/>(voice/text) Memory->>Thinker: get_context(max_messages=10) Thinker->>RAG: retrieve_passages(query) RAG-->>Thinker: relevant_passages Note over Thinker: Build LLM context<br/>with history + RAG Thinker-->>Memory: add_entry(role="assistant") Thinker-->>Talker: response_stream Talker-->>Frontend: audio_chunks Note over Memory: Context preserved<br/>across mode switches

Memory Flow on Mode Switch

flowchart TD subgraph Voice Mode VA[🎤 Voice Input] VT[Voice Transcript] VM[Voice Message Entry] end subgraph Text Mode TA[⌨️ Text Input] TM[Text Message Entry] end subgraph Unified Memory MC[Message Context] LC[Language Events] RC[RAG Context] ME[Mode Events] end subgraph Thinker-Talker TH[Thinker LLM] TK[Talker TTS] end VA --> VT --> VM --> MC TA --> TM --> MC VM --> ME TM --> ME MC --> TH LC --> TH RC --> TH TH --> TK style MC fill:#FFD700

Environment Variable for Data Directory

When customizing lexicon paths, use the _resolve_data_dir() helper:

from app.core.config import _resolve_data_dir # Returns VOICEASSIST_DATA_DIR env var or default ./data data_dir = _resolve_data_dir() # Lexicon paths relative to data dir lexicons_path = data_dir / "lexicons" / "medical_terms.txt"

Environment variable: VOICEASSIST_DATA_DIR=/path/to/data

If not set, defaults to ./data relative to the working directory.

Memory Architecture

Memory Layers

LayerScopeRetentionStorage
SessionCurrent sessionUntil closeRedis
Short-termLast 24 hours24h TTLRedis
Long-termUser historyConfigurablePostgreSQL
EpisodicKey momentsIndefinitePostgreSQL

Memory Entry Structure

@dataclass class MemoryEntry: """Single memory entry in the conversation.""" id: str session_id: str user_id: str timestamp: datetime # Content role: Literal["user", "assistant", "system"] content: str mode: Literal["voice", "text"] # Context language: str detected_language: str language_switched: bool # RAG context retrieved_passages: List[str] sources: List[Dict] # Metadata latency_ms: Optional[float] degradations: List[str] phi_detected: bool

Implementation

UnifiedMemoryService

from app.services.unified_memory import UnifiedMemoryService memory_service = UnifiedMemoryService() # Add voice message to memory await memory_service.add_entry( session_id="session_123", user_id="user_456", entry=MemoryEntry( role="user", content="What is metformin used for?", mode="voice", language="en", detected_language="en", language_switched=False ) ) # Get context for LLM context = await memory_service.get_context( session_id="session_123", max_messages=10, include_rag=True )

Cross-Modal Context

When switching from voice to text (or vice versa):

async def handle_mode_switch( session_id: str, from_mode: str, to_mode: str ) -> ConversationContext: """Handle mode switch while preserving context.""" # Get existing conversation context context = await memory_service.get_context(session_id) # Add mode switch event await memory_service.add_event( session_id=session_id, event_type="mode_switch", data={ "from_mode": from_mode, "to_mode": to_mode, "timestamp": datetime.utcnow().isoformat() } ) # Return context for new mode return context

Language Switching Events

Track language changes for multilingual users:

async def track_language_switch( session_id: str, from_language: str, to_language: str, trigger: str # "user_request" | "auto_detected" | "explicit_setting" ): """Track when user switches languages.""" await memory_service.add_event( session_id=session_id, event_type="language_switch", data={ "from_language": from_language, "to_language": to_language, "trigger": trigger, "timestamp": datetime.utcnow().isoformat() } ) # Update session language preference await session_service.update_language( session_id=session_id, language=to_language )

Context Building

Building LLM Context

async def build_llm_context( session_id: str, current_query: str, rag_results: List[Dict] ) -> List[Dict]: """Build context for LLM including memory.""" # Get conversation history history = await memory_service.get_history( session_id=session_id, max_messages=10 ) # Get language switches (for context awareness) language_events = await memory_service.get_events( session_id=session_id, event_type="language_switch", limit=5 ) # Build messages array messages = [] # System prompt with context system_prompt = build_system_prompt( language_history=language_events, rag_context=rag_results ) messages.append({"role": "system", "content": system_prompt}) # Add conversation history for entry in history: messages.append({ "role": entry.role, "content": entry.content }) # Add current query messages.append({ "role": "user", "content": current_query }) return messages

Context Truncation

When context exceeds token limits:

async def truncate_context( messages: List[Dict], max_tokens: int = 4000 ) -> List[Dict]: """Truncate context while preserving important information.""" # Always keep: system prompt, last 3 messages protected = messages[:1] + messages[-3:] middle = messages[1:-3] # Count tokens total_tokens = count_tokens(messages) if total_tokens <= max_tokens: return messages # Summarize middle messages if middle: summary = await summarize_messages(middle) summary_message = { "role": "system", "content": f"[Previous conversation summary: {summary}]" } return [messages[0], summary_message] + messages[-3:] return protected

Session Persistence

Redis Session Storage

class RedisMemoryStore: """Redis-backed memory store for sessions.""" def __init__(self, redis_client: Redis): self.redis = redis_client self.ttl = 86400 # 24 hours async def save_session( self, session_id: str, memory: List[MemoryEntry] ): key = f"memory:{session_id}" data = json.dumps([entry.to_dict() for entry in memory]) await self.redis.set(key, data, ex=self.ttl) async def load_session( self, session_id: str ) -> List[MemoryEntry]: key = f"memory:{session_id}" data = await self.redis.get(key) if data: entries = json.loads(data) return [MemoryEntry.from_dict(e) for e in entries] return [] async def extend_ttl(self, session_id: str): key = f"memory:{session_id}" await self.redis.expire(key, self.ttl)

Long-term Storage

For persistent memory across sessions:

class PostgresMemoryStore: """PostgreSQL-backed long-term memory store.""" async def save_conversation( self, user_id: str, session_id: str, entries: List[MemoryEntry] ): """Save conversation to long-term storage.""" async with self.db.transaction(): # Save conversation record conversation = await self.db.execute( """ INSERT INTO conversations (user_id, session_id, created_at) VALUES ($1, $2, NOW()) RETURNING id """, user_id, session_id ) # Save entries for entry in entries: await self.db.execute( """ INSERT INTO conversation_entries (conversation_id, role, content, mode, language, timestamp) VALUES ($1, $2, $3, $4, $5, $6) """, conversation.id, entry.role, entry.content, entry.mode, entry.language, entry.timestamp )

Privacy Controls

User Memory Settings

@dataclass class MemorySettings: """User's memory and privacy preferences.""" enabled: bool = True retention_days: int = 30 cross_session: bool = True save_voice_transcripts: bool = True save_rag_context: bool = True anonymize_phi: bool = True

Memory Deletion

async def delete_user_memory( user_id: str, scope: Literal["session", "day", "all"] ): """Delete user's conversation memory.""" if scope == "session": await redis_store.delete_session(user_id) elif scope == "day": await postgres_store.delete_today(user_id) elif scope == "all": await redis_store.delete_all(user_id) await postgres_store.delete_all(user_id) logger.info(f"Deleted memory for user {user_id}, scope: {scope}")

Frontend Integration

Memory Hook

import { useUnifiedMemory } from "@/hooks/useUnifiedMemory"; const ChatContainer = () => { const { messages, addMessage, clearMemory, mode, switchMode } = useUnifiedMemory(); const handleSend = async (content: string) => { // Add to unified memory await addMessage({ role: "user", content, mode: mode, // "voice" or "text" language: currentLanguage, }); // Get AI response const response = await fetchResponse(content); // Add response to memory await addMessage({ role: "assistant", content: response.text, mode: mode, language: response.language, }); }; return ( <div> <ChatHistory messages={messages} /> <ModeSwitch mode={mode} onSwitch={switchMode} /> <ChatInput onSend={handleSend} mode={mode} /> </div> ); };

Mode Switch UI

const ModeSwitch: React.FC<{ mode: Mode; onSwitch: (m: Mode) => void }> = ({ mode, onSwitch }) => { return ( <div className="flex gap-2 p-2 bg-gray-100 rounded-lg"> <button className={cn("px-4 py-2 rounded", mode === "text" ? "bg-white shadow" : "text-gray-600")} onClick={() => onSwitch("text")} aria-pressed={mode === "text"} > 💬 Text </button> <button className={cn("px-4 py-2 rounded", mode === "voice" ? "bg-white shadow" : "text-gray-600")} onClick={() => onSwitch("voice")} aria-pressed={mode === "voice"} > 🎤 Voice </button> </div> ); };

Testing

Unit Tests

@pytest.mark.asyncio async def test_cross_modal_context(): """Test context preservation across voice/text modes.""" memory = UnifiedMemoryService() # Add voice message await memory.add_entry( session_id="s1", entry=MemoryEntry( role="user", content="What is diabetes?", mode="voice", language="en" ) ) # Switch to text mode await memory.add_event( session_id="s1", event_type="mode_switch", data={"from_mode": "voice", "to_mode": "text"} ) # Get context for text mode context = await memory.get_context("s1") assert len(context.messages) == 1 assert context.messages[0].content == "What is diabetes?" assert context.messages[0].mode == "voice" @pytest.mark.asyncio async def test_language_switch_tracking(): """Test language switch event tracking.""" memory = UnifiedMemoryService() await memory.track_language_switch( session_id="s1", from_language="en", to_language="ar", trigger="auto_detected" ) events = await memory.get_events("s1", "language_switch") assert len(events) == 1 assert events[0]["from_language"] == "en" assert events[0]["to_language"] == "ar"
Beginning of guide
End of guide