2:I[7012,["4765","static/chunks/4765-f5afdf8061f456f3.js","9856","static/chunks/9856-3b185291364d9bef.js","6687","static/chunks/app/docs/%5B...slug%5D/page-e07536548216bee4.js"],"MarkdownRenderer"] 4:I[9856,["4765","static/chunks/4765-f5afdf8061f456f3.js","9856","static/chunks/9856-3b185291364d9bef.js","6687","static/chunks/app/docs/%5B...slug%5D/page-e07536548216bee4.js"],""] 5:I[4126,[],""] 7:I[9630,[],""] 8:I[4278,["9856","static/chunks/9856-3b185291364d9bef.js","8172","static/chunks/8172-b3a2d6fe4ae10d40.js","3185","static/chunks/app/layout-2814fa5d15b84fe4.js"],"HeadingProvider"] 9:I[1476,["9856","static/chunks/9856-3b185291364d9bef.js","8172","static/chunks/8172-b3a2d6fe4ae10d40.js","3185","static/chunks/app/layout-2814fa5d15b84fe4.js"],"Header"] a:I[3167,["9856","static/chunks/9856-3b185291364d9bef.js","8172","static/chunks/8172-b3a2d6fe4ae10d40.js","3185","static/chunks/app/layout-2814fa5d15b84fe4.js"],"Sidebar"] b:I[7409,["9856","static/chunks/9856-3b185291364d9bef.js","8172","static/chunks/8172-b3a2d6fe4ae10d40.js","3185","static/chunks/app/layout-2814fa5d15b84fe4.js"],"PageFrame"] 3:T1dc7, # VoiceAssist Voice State - November 28, 2025 ## Summary Voice mode has been significantly improved with barge-in support, audio overlap prevention, and graceful error handling. The system now properly handles user interruptions during AI responses. **Voice Mode Overhaul (2025-11-29)**: Added per-user voice preferences persistence, context-aware voice style detection, advanced TTS controls, and aggressive latency optimizations (200ms VAD, 256-sample chunks, 300ms reconnect). ## Changes Since Last Update (2025-11-25) ### New Features | Feature | Status | Description | | ------------------------ | -------- | -------------------------------------------------- | | Barge-in support | **Live** | User can interrupt AI while speaking | | Audio overlap prevention | **Live** | Prevents multiple responses playing simultaneously | | Benign error handling | **Live** | Gracefully handles cancellation failures | | Audio playback tracking | **Live** | Tracks current audio element for cleanup | ### Voice Mode Overhaul (2025-11-29) | Feature | Status | Description | | -------------------------------- | -------- | ------------------------------------------------------- | | User voice preferences (backend) | **Live** | Per-user TTS settings stored in database | | Context-aware voice styles | **Live** | Auto-detects CALM/URGENT/EMPATHETIC/INSTRUCTIONAL tones | | Advanced TTS controls | **Live** | Stability, clarity, expressiveness sliders in UI | | Aggressive VAD tuning | **Live** | 200ms silence, 150ms prefix, 256-sample chunks | | Faster reconnection | **Live** | 300ms base delay (was 1000ms) | | Backend preferences sync | **Live** | Cross-device settings via `/api/voice/preferences` | ### Technical Implementation #### 1. Barge-in Flow When the user starts speaking while the AI is responding: ``` input_audio_buffer.speech_started ↓ Check activeResponseIdRef ↓ Send response.cancel to OpenAI ↓ Call onSpeechStarted() callback ↓ VoiceModePanel.stopCurrentAudio() ↓ Audio stops, queue cleared, response ID incremented ``` #### 2. Audio Playback Management New refs added to `VoiceModePanel.tsx`: ```typescript // Track currently playing Audio element for stopping on barge-in const currentAudioRef = useRef(null); // Prevent overlapping response processing const isProcessingResponseRef = useRef(false); // Response ID to invalidate stale responses const currentResponseIdRef = useRef(0); ``` #### 3. Response Tracking New refs added to `useRealtimeVoiceSession.ts`: ```typescript // Track active response ID for cancellation const activeResponseIdRef = useRef(null); ``` Handled message types: - `response.created` - Track new response ID - `response.done` - Clear response ID - `response.cancelled` - Clear response ID #### 4. Benign Error Handling Errors like "Cancellation failed: no active response found" are now handled gracefully: ```typescript case "error": { const errorMessage = message.error?.message || "Realtime API error"; if ( errorMessage.includes("Cancellation failed") || errorMessage.includes("no active response") || errorCode === "cancellation_failed" ) { voiceLog.debug(`Ignoring benign error: ${errorMessage}`); break; } handleError(new Error(errorMessage)); break; } ``` ### Files Modified | File | Changes | | ------------------------------------------------------ | ----------------------------------------------------------------------------------------------------------------- | | `apps/web-app/src/hooks/useRealtimeVoiceSession.ts` | Added `activeResponseIdRef`, `onSpeechStarted` callback, response tracking, barge-in logic, benign error handling | | `apps/web-app/src/components/voice/VoiceModePanel.tsx` | Added audio tracking refs, `stopCurrentAudio()`, overlap prevention in `onRelayResult` | ## Current Test Status | Test Suite | Tests | Status | | ------------------------------------------ | --------------------------- | ------ | | Backend: test_openai_config.py | 17 passed, 3 skipped (live) | ✅ | | Backend: test_voice_metrics.py | 11 passed | ✅ | | Frontend: useRealtimeVoiceSession | 22 passed | ✅ | | Frontend: voiceSettingsStore | 17 passed | ✅ | | Frontend: VoiceModeSettings | 25 passed | ✅ | | Frontend: useChatSession-voice-integration | 8 passed | ✅ | ## Known Issues 1. **First audio chunk silent**: First audio chunk may show `-Infinity dB` - this is expected before mic produces audio 2. **WebSocket errors on page navigation**: Expected when switching conversations - handled gracefully ## Architecture Overview ``` Frontend (dev.asimo.io) ├── VoiceModePanel (UI component) │ ├── stopCurrentAudio() - stops playback on barge-in │ ├── currentAudioRef - tracks playing audio │ ├── isProcessingResponseRef - prevents overlaps │ └── currentResponseIdRef - invalidates stale responses │ ├── useRealtimeVoiceSession (hook) │ ├── activeResponseIdRef - tracks OpenAI response │ ├── onSpeechStarted callback - notifies panel │ ├── response.cancel - sends to OpenAI │ └── Benign error handling │ └── voiceSettingsStore (Zustand) └── Persists: voice, language, vadSensitivity ``` ## Quick Commands ```bash # Run backend voice tests cd /home/asimo/VoiceAssist/services/api-gateway source venv/bin/activate && export PYTHONPATH=. python -m pytest tests/integration/test_openai_config.py tests/integration/test_voice_metrics.py -v # Run frontend voice tests cd /home/asimo/VoiceAssist/apps/web-app export NODE_OPTIONS="--max-old-space-size=768" npx vitest run src/hooks/__tests__/useRealtimeVoiceSession.test.ts \ src/stores/__tests__/voiceSettingsStore.test.ts \ src/components/voice/__tests__/VoiceModeSettings.test.tsx # Build web app cd /home/asimo/VoiceAssist/apps/web-app pnpm build ``` ## TODOs for Future Work ### Voice UX Features - [ ] Audio level visualization during recording - [x] Per-user voice preferences persistence (backend) ✅ Implemented 2025-11-29 - [ ] Voice activity visualization improvements - [ ] Multi-language auto-detection - [ ] Session resumption on reconnect ### Testing - [ ] E2E tests for barge-in functionality - [ ] Test voice→chat transcript content in chat timeline - [ ] Performance baseline (connection <2s, STT <500ms) ### Infrastructure - [ ] Configure Prometheus scrapes for voice metrics - [ ] Set up Grafana dashboards for voice SLOs - [ ] Configure Sentry alerts for voice SLO violations ## Related Documentation - [VOICE_MODE_PIPELINE.md](./VOICE_MODE_PIPELINE.md) - Full pipeline architecture - [VOICE_MODE_SETTINGS_GUIDE.md](./VOICE_MODE_SETTINGS_GUIDE.md) - User settings - [VOICE_READY_STATE_2025-11-25.md](./VOICE_READY_STATE_2025-11-25.md) - Previous state --- _Last updated: 2025-11-28 by Claude_ 6:["slug","VOICE_STATE_2025-11-29","c"] 0:["X7oMT3VrOffzp0qvbeOas",[[["",{"children":["docs",{"children":[["slug","VOICE_STATE_2025-11-29","c"],{"children":["__PAGE__?{\"slug\":[\"VOICE_STATE_2025-11-29\"]}",{}]}]}]},"$undefined","$undefined",true],["",{"children":["docs",{"children":[["slug","VOICE_STATE_2025-11-29","c"],{"children":["__PAGE__",{},[["$L1",["$","div",null,{"children":[["$","div",null,{"className":"mb-6 flex items-center justify-between gap-4","children":[["$","div",null,{"children":[["$","p",null,{"className":"text-sm text-gray-500 dark:text-gray-400","children":"Docs / Raw"}],["$","h1",null,{"className":"text-3xl font-bold text-gray-900 dark:text-white","children":"Voice State 2025-11-29"}],["$","p",null,{"className":"text-sm text-gray-600 dark:text-gray-400","children":["Sourced from"," ",["$","code",null,{"className":"font-mono text-xs","children":["docs/","VOICE_STATE_2025-11-29.md"]}]]}]]}],["$","a",null,{"href":"https://github.com/mohammednazmy/VoiceAssist/edit/main/docs/VOICE_STATE_2025-11-29.md","target":"_blank","rel":"noreferrer","className":"inline-flex items-center gap-2 rounded-md border border-gray-200 dark:border-gray-700 px-3 py-1.5 text-sm text-gray-700 dark:text-gray-200 hover:border-primary-500 dark:hover:border-primary-400 hover:text-primary-700 dark:hover:text-primary-300","children":"Edit on GitHub"}]]}],["$","div",null,{"className":"rounded-lg border border-gray-200 dark:border-gray-800 bg-white dark:bg-gray-900 p-6","children":["$","$L2",null,{"content":"$3"}]}],["$","div",null,{"className":"mt-6 flex flex-wrap gap-2 text-sm","children":[["$","$L4",null,{"href":"/reference/all-docs","className":"inline-flex items-center gap-1 rounded-md bg-gray-100 px-3 py-1 text-gray-700 hover:bg-gray-200 dark:bg-gray-800 dark:text-gray-200 dark:hover:bg-gray-700","children":"← All documentation"}],["$","$L4",null,{"href":"/","className":"inline-flex items-center gap-1 rounded-md bg-gray-100 px-3 py-1 text-gray-700 hover:bg-gray-200 dark:bg-gray-800 dark:text-gray-200 dark:hover:bg-gray-700","children":"Home"}]]}]]}],null],null],null]},[null,["$","$L5",null,{"parallelRouterKey":"children","segmentPath":["children","docs","children","$6","children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L7",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":"$undefined","notFoundStyles":"$undefined"}]],null]},[null,["$","$L5",null,{"parallelRouterKey":"children","segmentPath":["children","docs","children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L7",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":"$undefined","notFoundStyles":"$undefined"}]],null]},[[[["$","link","0",{"rel":"stylesheet","href":"/_next/static/css/7f586cdbbaa33ff7.css","precedence":"next","crossOrigin":"$undefined"}]],["$","html",null,{"lang":"en","className":"h-full","children":["$","body",null,{"className":"__className_f367f3 h-full bg-white dark:bg-gray-900","children":[["$","a",null,{"href":"#main-content","className":"skip-to-content","children":"Skip to main content"}],["$","$L8",null,{"children":[["$","$L9",null,{}],["$","$La",null,{}],["$","main",null,{"id":"main-content","className":"lg:pl-64","role":"main","aria-label":"Documentation content","children":["$","$Lb",null,{"children":["$","$L5",null,{"parallelRouterKey":"children","segmentPath":["children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L7",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":[["$","title",null,{"children":"404: This page could not be found."}],["$","div",null,{"style":{"fontFamily":"system-ui,\"Segoe UI\",Roboto,Helvetica,Arial,sans-serif,\"Apple Color Emoji\",\"Segoe UI Emoji\"","height":"100vh","textAlign":"center","display":"flex","flexDirection":"column","alignItems":"center","justifyContent":"center"},"children":["$","div",null,{"children":[["$","style",null,{"dangerouslySetInnerHTML":{"__html":"body{color:#000;background:#fff;margin:0}.next-error-h1{border-right:1px solid rgba(0,0,0,.3)}@media (prefers-color-scheme:dark){body{color:#fff;background:#000}.next-error-h1{border-right:1px solid rgba(255,255,255,.3)}}"}}],["$","h1",null,{"className":"next-error-h1","style":{"display":"inline-block","margin":"0 20px 0 0","padding":"0 23px 0 0","fontSize":24,"fontWeight":500,"verticalAlign":"top","lineHeight":"49px"},"children":"404"}],["$","div",null,{"style":{"display":"inline-block"},"children":["$","h2",null,{"style":{"fontSize":14,"fontWeight":400,"lineHeight":"49px","margin":0},"children":"This page could not be found."}]}]]}]}]],"notFoundStyles":[]}]}]}]]}]]}]}]],null],null],["$Lc",null]]]] c:[["$","meta","0",{"name":"viewport","content":"width=device-width, initial-scale=1"}],["$","meta","1",{"charSet":"utf-8"}],["$","title","2",{"children":"Voice State 2025-11-29 | Docs | VoiceAssist Docs"}],["$","meta","3",{"name":"description","content":"Voice mode now includes barge-in support, audio overlap prevention, user preferences persistence, context-aware styles, and aggressive latency optimizations."}],["$","meta","4",{"name":"keywords","content":"VoiceAssist,documentation,medical AI,voice assistant,healthcare,HIPAA,API"}],["$","meta","5",{"name":"robots","content":"index, follow"}],["$","meta","6",{"name":"googlebot","content":"index, follow"}],["$","link","7",{"rel":"canonical","href":"https://assistdocs.asimo.io"}],["$","meta","8",{"property":"og:title","content":"VoiceAssist Documentation"}],["$","meta","9",{"property":"og:description","content":"Comprehensive documentation for VoiceAssist - Enterprise Medical AI Assistant"}],["$","meta","10",{"property":"og:url","content":"https://assistdocs.asimo.io"}],["$","meta","11",{"property":"og:site_name","content":"VoiceAssist Docs"}],["$","meta","12",{"property":"og:type","content":"website"}],["$","meta","13",{"name":"twitter:card","content":"summary"}],["$","meta","14",{"name":"twitter:title","content":"VoiceAssist Documentation"}],["$","meta","15",{"name":"twitter:description","content":"Comprehensive documentation for VoiceAssist - Enterprise Medical AI Assistant"}],["$","meta","16",{"name":"next-size-adjust"}]] 1:null