2:I[7012,["4765","static/chunks/4765-f5afdf8061f456f3.js","9856","static/chunks/9856-3b185291364d9bef.js","6687","static/chunks/app/docs/%5B...slug%5D/page-e07536548216bee4.js"],"MarkdownRenderer"] 4:I[9856,["4765","static/chunks/4765-f5afdf8061f456f3.js","9856","static/chunks/9856-3b185291364d9bef.js","6687","static/chunks/app/docs/%5B...slug%5D/page-e07536548216bee4.js"],""] 5:I[4126,[],""] 7:I[9630,[],""] 8:I[4278,["9856","static/chunks/9856-3b185291364d9bef.js","8172","static/chunks/8172-b3a2d6fe4ae10d40.js","3185","static/chunks/app/layout-2814fa5d15b84fe4.js"],"HeadingProvider"] 9:I[1476,["9856","static/chunks/9856-3b185291364d9bef.js","8172","static/chunks/8172-b3a2d6fe4ae10d40.js","3185","static/chunks/app/layout-2814fa5d15b84fe4.js"],"Header"] a:I[3167,["9856","static/chunks/9856-3b185291364d9bef.js","8172","static/chunks/8172-b3a2d6fe4ae10d40.js","3185","static/chunks/app/layout-2814fa5d15b84fe4.js"],"Sidebar"] b:I[7409,["9856","static/chunks/9856-3b185291364d9bef.js","8172","static/chunks/8172-b3a2d6fe4ae10d40.js","3185","static/chunks/app/layout-2814fa5d15b84fe4.js"],"PageFrame"] 3:T3f5d, # Unified Conversation Memory Voice Mode v4.1 introduces unified conversation memory that maintains context across voice and text interactions, enabling seamless mode switching. ## Overview The unified memory system provides: - **Cross-modal context**: Conversation history shared between voice and text - **Language switching events**: Tracks when users switch languages - **Mode transition handling**: Preserves context when switching voice ↔ text - **Session persistence**: Maintains memory across browser refreshes - **Privacy controls**: User-controlled memory retention ``` ┌─────────────────────────────────────────────────────────────────┐ │ Unified Memory Store │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ ┌──────────────┐ ┌──────────────┐ │ │ │ Voice Mode │◄────── Shared ──────►│ Text Mode │ │ │ │ │ Memory │ │ │ │ └──────────────┘ └──────────────┘ │ │ │ │ │ │ ▼ ▼ │ │ ┌──────────────────────────────────────────────────┐ │ │ │ Conversation Context │ │ │ ├──────────────────────────────────────────────────┤ │ │ │ • Message history (last 50 messages) │ │ │ │ • Language preferences & switches │ │ │ │ • RAG context (retrieved passages) │ │ │ │ • User preferences │ │ │ │ • Session metadata │ │ │ └──────────────────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────┘ ``` ### Thinker-Talker Pipeline Integration ```mermaid sequenceDiagram participant User participant Frontend participant Memory as Unified Memory participant Thinker participant RAG participant Talker User->>Frontend: Voice or Text input Frontend->>Memory: add_entry(role="user", mode, content) Note over Memory: Store with mode tag
(voice/text) Memory->>Thinker: get_context(max_messages=10) Thinker->>RAG: retrieve_passages(query) RAG-->>Thinker: relevant_passages Note over Thinker: Build LLM context
with history + RAG Thinker-->>Memory: add_entry(role="assistant") Thinker-->>Talker: response_stream Talker-->>Frontend: audio_chunks Note over Memory: Context preserved
across mode switches ``` ### Memory Flow on Mode Switch ```mermaid flowchart TD subgraph Voice Mode VA[🎤 Voice Input] VT[Voice Transcript] VM[Voice Message Entry] end subgraph Text Mode TA[⌨️ Text Input] TM[Text Message Entry] end subgraph Unified Memory MC[Message Context] LC[Language Events] RC[RAG Context] ME[Mode Events] end subgraph Thinker-Talker TH[Thinker LLM] TK[Talker TTS] end VA --> VT --> VM --> MC TA --> TM --> MC VM --> ME TM --> ME MC --> TH LC --> TH RC --> TH TH --> TK style MC fill:#FFD700 ``` ### Environment Variable for Data Directory When customizing lexicon paths, use the `_resolve_data_dir()` helper: ```python from app.core.config import _resolve_data_dir # Returns VOICEASSIST_DATA_DIR env var or default ./data data_dir = _resolve_data_dir() # Lexicon paths relative to data dir lexicons_path = data_dir / "lexicons" / "medical_terms.txt" ``` **Environment variable**: `VOICEASSIST_DATA_DIR=/path/to/data` If not set, defaults to `./data` relative to the working directory. ## Memory Architecture ### Memory Layers | Layer | Scope | Retention | Storage | | ---------- | --------------- | ------------ | ---------- | | Session | Current session | Until close | Redis | | Short-term | Last 24 hours | 24h TTL | Redis | | Long-term | User history | Configurable | PostgreSQL | | Episodic | Key moments | Indefinite | PostgreSQL | ### Memory Entry Structure ```python @dataclass class MemoryEntry: """Single memory entry in the conversation.""" id: str session_id: str user_id: str timestamp: datetime # Content role: Literal["user", "assistant", "system"] content: str mode: Literal["voice", "text"] # Context language: str detected_language: str language_switched: bool # RAG context retrieved_passages: List[str] sources: List[Dict] # Metadata latency_ms: Optional[float] degradations: List[str] phi_detected: bool ``` ## Implementation ### UnifiedMemoryService ```python from app.services.unified_memory import UnifiedMemoryService memory_service = UnifiedMemoryService() # Add voice message to memory await memory_service.add_entry( session_id="session_123", user_id="user_456", entry=MemoryEntry( role="user", content="What is metformin used for?", mode="voice", language="en", detected_language="en", language_switched=False ) ) # Get context for LLM context = await memory_service.get_context( session_id="session_123", max_messages=10, include_rag=True ) ``` ### Cross-Modal Context When switching from voice to text (or vice versa): ```python async def handle_mode_switch( session_id: str, from_mode: str, to_mode: str ) -> ConversationContext: """Handle mode switch while preserving context.""" # Get existing conversation context context = await memory_service.get_context(session_id) # Add mode switch event await memory_service.add_event( session_id=session_id, event_type="mode_switch", data={ "from_mode": from_mode, "to_mode": to_mode, "timestamp": datetime.utcnow().isoformat() } ) # Return context for new mode return context ``` ### Language Switching Events Track language changes for multilingual users: ```python async def track_language_switch( session_id: str, from_language: str, to_language: str, trigger: str # "user_request" | "auto_detected" | "explicit_setting" ): """Track when user switches languages.""" await memory_service.add_event( session_id=session_id, event_type="language_switch", data={ "from_language": from_language, "to_language": to_language, "trigger": trigger, "timestamp": datetime.utcnow().isoformat() } ) # Update session language preference await session_service.update_language( session_id=session_id, language=to_language ) ``` ## Context Building ### Building LLM Context ```python async def build_llm_context( session_id: str, current_query: str, rag_results: List[Dict] ) -> List[Dict]: """Build context for LLM including memory.""" # Get conversation history history = await memory_service.get_history( session_id=session_id, max_messages=10 ) # Get language switches (for context awareness) language_events = await memory_service.get_events( session_id=session_id, event_type="language_switch", limit=5 ) # Build messages array messages = [] # System prompt with context system_prompt = build_system_prompt( language_history=language_events, rag_context=rag_results ) messages.append({"role": "system", "content": system_prompt}) # Add conversation history for entry in history: messages.append({ "role": entry.role, "content": entry.content }) # Add current query messages.append({ "role": "user", "content": current_query }) return messages ``` ### Context Truncation When context exceeds token limits: ```python async def truncate_context( messages: List[Dict], max_tokens: int = 4000 ) -> List[Dict]: """Truncate context while preserving important information.""" # Always keep: system prompt, last 3 messages protected = messages[:1] + messages[-3:] middle = messages[1:-3] # Count tokens total_tokens = count_tokens(messages) if total_tokens <= max_tokens: return messages # Summarize middle messages if middle: summary = await summarize_messages(middle) summary_message = { "role": "system", "content": f"[Previous conversation summary: {summary}]" } return [messages[0], summary_message] + messages[-3:] return protected ``` ## Session Persistence ### Redis Session Storage ```python class RedisMemoryStore: """Redis-backed memory store for sessions.""" def __init__(self, redis_client: Redis): self.redis = redis_client self.ttl = 86400 # 24 hours async def save_session( self, session_id: str, memory: List[MemoryEntry] ): key = f"memory:{session_id}" data = json.dumps([entry.to_dict() for entry in memory]) await self.redis.set(key, data, ex=self.ttl) async def load_session( self, session_id: str ) -> List[MemoryEntry]: key = f"memory:{session_id}" data = await self.redis.get(key) if data: entries = json.loads(data) return [MemoryEntry.from_dict(e) for e in entries] return [] async def extend_ttl(self, session_id: str): key = f"memory:{session_id}" await self.redis.expire(key, self.ttl) ``` ### Long-term Storage For persistent memory across sessions: ```python class PostgresMemoryStore: """PostgreSQL-backed long-term memory store.""" async def save_conversation( self, user_id: str, session_id: str, entries: List[MemoryEntry] ): """Save conversation to long-term storage.""" async with self.db.transaction(): # Save conversation record conversation = await self.db.execute( """ INSERT INTO conversations (user_id, session_id, created_at) VALUES ($1, $2, NOW()) RETURNING id """, user_id, session_id ) # Save entries for entry in entries: await self.db.execute( """ INSERT INTO conversation_entries (conversation_id, role, content, mode, language, timestamp) VALUES ($1, $2, $3, $4, $5, $6) """, conversation.id, entry.role, entry.content, entry.mode, entry.language, entry.timestamp ) ``` ## Privacy Controls ### User Memory Settings ```python @dataclass class MemorySettings: """User's memory and privacy preferences.""" enabled: bool = True retention_days: int = 30 cross_session: bool = True save_voice_transcripts: bool = True save_rag_context: bool = True anonymize_phi: bool = True ``` ### Memory Deletion ```python async def delete_user_memory( user_id: str, scope: Literal["session", "day", "all"] ): """Delete user's conversation memory.""" if scope == "session": await redis_store.delete_session(user_id) elif scope == "day": await postgres_store.delete_today(user_id) elif scope == "all": await redis_store.delete_all(user_id) await postgres_store.delete_all(user_id) logger.info(f"Deleted memory for user {user_id}, scope: {scope}") ``` ## Frontend Integration ### Memory Hook ```tsx import { useUnifiedMemory } from "@/hooks/useUnifiedMemory"; const ChatContainer = () => { const { messages, addMessage, clearMemory, mode, switchMode } = useUnifiedMemory(); const handleSend = async (content: string) => { // Add to unified memory await addMessage({ role: "user", content, mode: mode, // "voice" or "text" language: currentLanguage, }); // Get AI response const response = await fetchResponse(content); // Add response to memory await addMessage({ role: "assistant", content: response.text, mode: mode, language: response.language, }); }; return (
); }; ``` ### Mode Switch UI ```tsx const ModeSwitch: React.FC<{ mode: Mode; onSwitch: (m: Mode) => void }> = ({ mode, onSwitch }) => { return (
); }; ``` ## Testing ### Unit Tests ```python @pytest.mark.asyncio async def test_cross_modal_context(): """Test context preservation across voice/text modes.""" memory = UnifiedMemoryService() # Add voice message await memory.add_entry( session_id="s1", entry=MemoryEntry( role="user", content="What is diabetes?", mode="voice", language="en" ) ) # Switch to text mode await memory.add_event( session_id="s1", event_type="mode_switch", data={"from_mode": "voice", "to_mode": "text"} ) # Get context for text mode context = await memory.get_context("s1") assert len(context.messages) == 1 assert context.messages[0].content == "What is diabetes?" assert context.messages[0].mode == "voice" @pytest.mark.asyncio async def test_language_switch_tracking(): """Test language switch event tracking.""" memory = UnifiedMemoryService() await memory.track_language_switch( session_id="s1", from_language="en", to_language="ar", trigger="auto_detected" ) events = await memory.get_events("s1", "language_switch") assert len(events) == 1 assert events[0]["from_language"] == "en" assert events[0]["to_language"] == "ar" ``` ## Related Documentation - [Voice Mode v4.1 Overview](./voice-mode-v4-overview.md) - [Multilingual RAG Architecture](./multilingual-rag-architecture.md) - [Latency Budgets Guide](./latency-budgets-guide.md) 6:["slug","voice/unified-memory","c"] 0:["X7oMT3VrOffzp0qvbeOas",[[["",{"children":["docs",{"children":[["slug","voice/unified-memory","c"],{"children":["__PAGE__?{\"slug\":[\"voice\",\"unified-memory\"]}",{}]}]}]},"$undefined","$undefined",true],["",{"children":["docs",{"children":[["slug","voice/unified-memory","c"],{"children":["__PAGE__",{},[["$L1",["$","div",null,{"children":[["$","div",null,{"className":"mb-6 flex items-center justify-between gap-4","children":[["$","div",null,{"children":[["$","p",null,{"className":"text-sm text-gray-500 dark:text-gray-400","children":"Docs / Raw"}],["$","h1",null,{"className":"text-3xl font-bold text-gray-900 dark:text-white","children":"Unified Conversation Memory"}],["$","p",null,{"className":"text-sm text-gray-600 dark:text-gray-400","children":["Sourced from"," ",["$","code",null,{"className":"font-mono text-xs","children":["docs/","voice/unified-memory.md"]}]]}]]}],["$","a",null,{"href":"https://github.com/mohammednazmy/VoiceAssist/edit/main/docs/voice/unified-memory.md","target":"_blank","rel":"noreferrer","className":"inline-flex items-center gap-2 rounded-md border border-gray-200 dark:border-gray-700 px-3 py-1.5 text-sm text-gray-700 dark:text-gray-200 hover:border-primary-500 dark:hover:border-primary-400 hover:text-primary-700 dark:hover:text-primary-300","children":"Edit on GitHub"}]]}],["$","div",null,{"className":"rounded-lg border border-gray-200 dark:border-gray-800 bg-white dark:bg-gray-900 p-6","children":["$","$L2",null,{"content":"$3"}]}],["$","div",null,{"className":"mt-6 flex flex-wrap gap-2 text-sm","children":[["$","$L4",null,{"href":"/reference/all-docs","className":"inline-flex items-center gap-1 rounded-md bg-gray-100 px-3 py-1 text-gray-700 hover:bg-gray-200 dark:bg-gray-800 dark:text-gray-200 dark:hover:bg-gray-700","children":"← All documentation"}],["$","$L4",null,{"href":"/","className":"inline-flex items-center gap-1 rounded-md bg-gray-100 px-3 py-1 text-gray-700 hover:bg-gray-200 dark:bg-gray-800 dark:text-gray-200 dark:hover:bg-gray-700","children":"Home"}]]}]]}],null],null],null]},[null,["$","$L5",null,{"parallelRouterKey":"children","segmentPath":["children","docs","children","$6","children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L7",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":"$undefined","notFoundStyles":"$undefined"}]],null]},[null,["$","$L5",null,{"parallelRouterKey":"children","segmentPath":["children","docs","children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L7",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":"$undefined","notFoundStyles":"$undefined"}]],null]},[[[["$","link","0",{"rel":"stylesheet","href":"/_next/static/css/7f586cdbbaa33ff7.css","precedence":"next","crossOrigin":"$undefined"}]],["$","html",null,{"lang":"en","className":"h-full","children":["$","body",null,{"className":"__className_f367f3 h-full bg-white dark:bg-gray-900","children":[["$","a",null,{"href":"#main-content","className":"skip-to-content","children":"Skip to main content"}],["$","$L8",null,{"children":[["$","$L9",null,{}],["$","$La",null,{}],["$","main",null,{"id":"main-content","className":"lg:pl-64","role":"main","aria-label":"Documentation content","children":["$","$Lb",null,{"children":["$","$L5",null,{"parallelRouterKey":"children","segmentPath":["children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L7",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":[["$","title",null,{"children":"404: This page could not be found."}],["$","div",null,{"style":{"fontFamily":"system-ui,\"Segoe UI\",Roboto,Helvetica,Arial,sans-serif,\"Apple Color Emoji\",\"Segoe UI Emoji\"","height":"100vh","textAlign":"center","display":"flex","flexDirection":"column","alignItems":"center","justifyContent":"center"},"children":["$","div",null,{"children":[["$","style",null,{"dangerouslySetInnerHTML":{"__html":"body{color:#000;background:#fff;margin:0}.next-error-h1{border-right:1px solid rgba(0,0,0,.3)}@media (prefers-color-scheme:dark){body{color:#fff;background:#000}.next-error-h1{border-right:1px solid rgba(255,255,255,.3)}}"}}],["$","h1",null,{"className":"next-error-h1","style":{"display":"inline-block","margin":"0 20px 0 0","padding":"0 23px 0 0","fontSize":24,"fontWeight":500,"verticalAlign":"top","lineHeight":"49px"},"children":"404"}],["$","div",null,{"style":{"display":"inline-block"},"children":["$","h2",null,{"style":{"fontSize":14,"fontWeight":400,"lineHeight":"49px","margin":0},"children":"This page could not be found."}]}]]}]}]],"notFoundStyles":[]}]}]}]]}]]}]}]],null],null],["$Lc",null]]]] c:[["$","meta","0",{"name":"viewport","content":"width=device-width, initial-scale=1"}],["$","meta","1",{"charSet":"utf-8"}],["$","title","2",{"children":"Unified Conversation Memory | Docs | VoiceAssist Docs"}],["$","meta","3",{"name":"description","content":"Guide to unified conversation memory across voice and text modes"}],["$","meta","4",{"name":"keywords","content":"VoiceAssist,documentation,medical AI,voice assistant,healthcare,HIPAA,API"}],["$","meta","5",{"name":"robots","content":"index, follow"}],["$","meta","6",{"name":"googlebot","content":"index, follow"}],["$","link","7",{"rel":"canonical","href":"https://assistdocs.asimo.io"}],["$","meta","8",{"property":"og:title","content":"VoiceAssist Documentation"}],["$","meta","9",{"property":"og:description","content":"Comprehensive documentation for VoiceAssist - Enterprise Medical AI Assistant"}],["$","meta","10",{"property":"og:url","content":"https://assistdocs.asimo.io"}],["$","meta","11",{"property":"og:site_name","content":"VoiceAssist Docs"}],["$","meta","12",{"property":"og:type","content":"website"}],["$","meta","13",{"name":"twitter:card","content":"summary"}],["$","meta","14",{"name":"twitter:title","content":"VoiceAssist Documentation"}],["$","meta","15",{"name":"twitter:description","content":"Comprehensive documentation for VoiceAssist - Enterprise Medical AI Assistant"}],["$","meta","16",{"name":"next-size-adjust"}]] 1:null