2:I[7012,["4765","static/chunks/4765-f5afdf8061f456f3.js","9856","static/chunks/9856-3b185291364d9bef.js","6687","static/chunks/app/docs/%5B...slug%5D/page-e07536548216bee4.js"],"MarkdownRenderer"] 4:I[9856,["4765","static/chunks/4765-f5afdf8061f456f3.js","9856","static/chunks/9856-3b185291364d9bef.js","6687","static/chunks/app/docs/%5B...slug%5D/page-e07536548216bee4.js"],""] 5:I[4126,[],""] 7:I[9630,[],""] 8:I[4278,["9856","static/chunks/9856-3b185291364d9bef.js","8172","static/chunks/8172-b3a2d6fe4ae10d40.js","3185","static/chunks/app/layout-2814fa5d15b84fe4.js"],"HeadingProvider"] 9:I[1476,["9856","static/chunks/9856-3b185291364d9bef.js","8172","static/chunks/8172-b3a2d6fe4ae10d40.js","3185","static/chunks/app/layout-2814fa5d15b84fe4.js"],"Header"] a:I[3167,["9856","static/chunks/9856-3b185291364d9bef.js","8172","static/chunks/8172-b3a2d6fe4ae10d40.js","3185","static/chunks/app/layout-2814fa5d15b84fe4.js"],"Sidebar"] b:I[7409,["9856","static/chunks/9856-3b185291364d9bef.js","8172","static/chunks/8172-b3a2d6fe4ae10d40.js","3185","static/chunks/app/layout-2814fa5d15b84fe4.js"],"PageFrame"] 3:T2e91, # Voice Mode v4.1 Overview Voice Mode v4.1 introduces significant enhancements to the VoiceAssist voice pipeline, focusing on multilingual support, performance safeguards, and improved user feedback. ## Key Features ### 1. Multilingual RAG with Translation Fallback The multilingual RAG service enables voice interactions in multiple languages: - **Translate-then-retrieve pattern**: Non-English queries are translated to English for RAG retrieval, then responses are translated back - **Multi-provider translation**: Primary provider (Google Translate) with automatic DeepL fallback - **Code-switching detection**: Handles bilingual speakers mixing languages - **Graceful degradation**: Falls back to original query when translation fails See [Multilingual RAG Architecture](./multilingual-rag-architecture.md) for technical details. ### 2. Medical Pronunciation Lexicons Language-specific pronunciation lexicons for accurate TTS: - **15 languages supported**: EN, ES, FR, DE, IT, PT, AR, ZH, HI, UR, JA, KO, RU, PL, TR - **Medical terminology**: Drug names, conditions, procedures, anatomy - **G2P fallback**: espeak-ng for terms not in lexicons - **Shared drug names**: 100 common medications with IPA pronunciations See [Lexicon Service Guide](./lexicon-service-guide.md) for usage. ### 3. Latency-Aware Orchestration Performance safeguards to maintain sub-700ms end-to-end latency: | Stage | Budget | Degradation Action | | ------------------ | --------- | ------------------------ | | Audio capture | 50ms | Log warning | | STT | 200ms | Use cached partial | | Language detection | 50ms | Default to user language | | Translation | 200ms | Skip translation | | RAG retrieval | 300ms | Return top-1 only | | LLM first token | 300ms | Shorten context | | TTS first chunk | 150ms | Use cached greeting | | **Total E2E** | **700ms** | **Feature degradation** | See [Latency Budgets Guide](./latency-budgets-guide.md) for implementation. ### 4. Thinking Feedback UX Multi-modal feedback during AI processing: - **Audio tones**: Gentle beep, soft chime, subtle tick (Web Audio API) - **Visual indicators**: Animated dots, pulsing circle, spinner, progress bar - **Haptic feedback**: Gentle pulse, rhythmic patterns (mobile) - **User controls**: Volume, style selection, enable/disable per modality See [Thinking Tone Settings](./thinking-tone-settings.md) for configuration. ## Feature Flags All v4.1 features are gated behind feature flags for safe rollout. Flags are grouped by workstream for independent deployment. ### Workstream 1: Translation & Multilingual RAG | Flag | Description | Default | Docs | | --------------------------------------- | --------------------------------------- | ------- | ------------------------------------------------------ | | `backend.voice_v4_translation_fallback` | Multi-provider translation with caching | Off | [Multilingual RAG](./multilingual-rag-architecture.md) | | `backend.voice_v4_multilingual_rag` | Translate-then-retrieve pipeline | Off | [Multilingual RAG](./multilingual-rag-architecture.md) | ### Workstream 2: Audio & Speech Processing | Flag | Description | Default | Docs | | ---------------------------------- | -------------------------------- | ------- | --------------------------------------------- | | `backend.voice_v4_lexicon_service` | Medical pronunciation lexicons | Off | [Lexicon Service](./lexicon-service-guide.md) | | `backend.voice_v4_phi_routing` | PHI-aware STT with local Whisper | Off | [PHI-Aware STT](./phi-aware-stt-routing.md) | | `backend.voice_v4_adaptive_vad` | User-tunable VAD presets | Off | [Adaptive VAD](./adaptive-vad-presets.md) | ### Workstream 3: Performance & Orchestration | Flag | Description | Default | Docs | | ---------------------------------- | ---------------------------- | ------- | --------------------------------------------- | | `backend.voice_v4_latency_budgets` | Latency-aware orchestration | Off | [Latency Budgets](./latency-budgets-guide.md) | | `backend.voice_v4_thinking_tones` | Backend thinking tone events | Off | [Thinking Tones](./thinking-tone-settings.md) | ### Workstream 4: Internationalization | Flag | Description | Default | Docs | | ------------------------------ | -------------------------------- | ------- | ------------------------------------- | | `backend.voice_v4_rtl_support` | RTL text rendering (Arabic/Urdu) | Off | [RTL Support](./rtl-support-guide.md) | ### Workstream 5: UI Enhancements | Flag | Description | Default | Docs | | ------------------------------------- | ------------------------------------ | ------- | --------------------------------------------------- | | `ui.voice_v4_voice_first_ui` | Voice-first unified input bar | Off | [Voice First Input Bar](./voice-first-input-bar.md) | | `ui.voice_v4_streaming_text` | Streaming text display during TTS | Off | [Streaming Text](./streaming-text-display.md) | | `ui.voice_v4_latency_indicator` | Latency status with degradation info | Off | [Latency Budgets](./latency-budgets-guide.md) | | `ui.voice_v4_thinking_feedback_panel` | Audio/visual/haptic feedback | Off | [Thinking Feedback](./thinking-tone-settings.md) | ### Flag Dependencies ``` voice_v4_multilingual_rag └── voice_v4_translation_fallback (required) voice_v4_thinking_feedback_panel (UI) └── voice_v4_thinking_tones (backend) voice_v4_rtl_support └── voice_v4_multilingual_rag (recommended) ``` ## Environment Variables ### Core Configuration ```bash # Lexicon data directory (optional, auto-detected if not set) VOICEASSIST_DATA_DIR=/path/to/voiceassist/data # Translation providers (encrypted in production) GOOGLE_TRANSLATE_API_KEY=your-key DEEPL_API_KEY=your-key # Feature flag defaults VOICE_V4_ENABLED=false # Master toggle # Latency budget overrides (milliseconds) VOICE_LATENCY_BUDGET_TOTAL=700 VOICE_LATENCY_BUDGET_STT=200 VOICE_LATENCY_BUDGET_TRANSLATION=200 VOICE_LATENCY_BUDGET_RAG=300 ``` ### VOICEASSIST_DATA_DIR Configuration The `VOICEASSIST_DATA_DIR` environment variable specifies the root directory for lexicon files and other data assets. **Setting the variable:** ```bash # Linux/macOS - add to ~/.bashrc or systemd service export VOICEASSIST_DATA_DIR=/opt/voiceassist/data # Docker - add to docker-compose.yml environment: - VOICEASSIST_DATA_DIR=/app/data # Kubernetes - add to deployment.yaml env: - name: VOICEASSIST_DATA_DIR value: /app/data # CI/CD - set in pipeline configuration env: VOICEASSIST_DATA_DIR: ${{ github.workspace }}/data ``` **Expected directory structure:** ``` $VOICEASSIST_DATA_DIR/ ├── lexicons/ │ ├── shared/ │ │ └── drug_names.json │ ├── en/ │ │ └── medical_phonemes.json │ ├── es/ │ │ └── medical_phonemes.json │ └── ... (other languages) └── models/ └── whisper/ (for local PHI-aware STT) ``` ### Data Directory Resolution The `_resolve_data_dir()` function in `lexicon_service.py` automatically locates the lexicon data directory using this priority: 1. **Environment variable**: If `VOICEASSIST_DATA_DIR` is set and the path exists, use it 2. **Repository root**: Walk up from the service file to find `data/lexicons/` 3. **Current working directory**: Check `./data/` relative to cwd 4. **Fallback**: Use relative path from the service file This allows the lexicon service to work in development, CI, and production without manual configuration. ## Error Propagation Voice Mode v4.1 implements structured error propagation to ensure graceful degradation without silent failures. ### Translation Errors When translation fails, the system raises `TranslationFailedError` and applies graceful degradation: ```python from app.services.latency_aware_orchestrator import ( TranslationFailedError, DegradationType ) try: result = await orchestrator.process_with_budgets( audio_data=audio_bytes, user_language="es" ) except TranslationFailedError as e: # Degradation already applied: uses original query logger.warning(f"Translation failed: {e}") # DegradationType.TRANSLATION_FAILED is in result.degradation_applied ``` ### Error → Degradation Mapping | Error | Degradation Type | Fallback Behavior | | -------------------------- | --------------------------------------- | ------------------------------------ | | `TranslationFailedError` | `TRANSLATION_FAILED` | Use original query, inform user | | Translation timeout | `TRANSLATION_BUDGET_EXCEEDED` | Skip translation, use original | | Language detection timeout | `LANGUAGE_DETECTION_SKIPPED` | Default to user's preferred language | | RAG retrieval slow | `RAG_LIMITED_TO_1` / `RAG_LIMITED_TO_3` | Return fewer results | | LLM context overflow | `LLM_CONTEXT_SHORTENED` | Truncate context history | | TTS cold start | `TTS_USED_CACHED_GREETING` | Play cached audio while warming up | ### UI Error Surfacing Degradation events are propagated to the frontend via WebSocket: ```typescript // Frontend receives degradation events socket.on("voice:degradation", (event: DegradationEvent) => { if (event.type === "TRANSLATION_FAILED") { showToast("Translation unavailable, using original language", "warning"); } }); // LatencyIndicator component shows degradation tooltips ; ``` ### Logging and Monitoring All errors emit structured logs and Prometheus metrics: ```python # Structured logging logger.warning("Translation failed", extra={ "error_type": "TranslationFailedError", "source_language": "es", "degradation": "TRANSLATION_FAILED", "fallback": "original_query" }) # Prometheus metrics voice_degradation_total.labels(type="translation_failed").inc() voice_error_total.labels(stage="translation", error="timeout").inc() ``` ## Migration Guide ### Upgrading from v3 1. **Feature flags**: All v4 features are disabled by default 2. **Lexicons**: Automatically loaded from `data/lexicons/` directory 3. **Translation**: Enable `voice_v4_translation_fallback` flag 4. **UI components**: Import from `@/components/voice/` ### Testing ```bash # Run v4 service tests cd services/api-gateway pytest tests/services/test_voice_v4_services.py -v # Validate lexicons python -c " from app.services.lexicon_service import get_lexicon_service import asyncio service = get_lexicon_service() reports = asyncio.run(service.validate_all_lexicons()) for lang, report in reports.items(): print(f'{lang}: {report.term_count} terms ({report.status})') " ``` ## Related Documentation - [Multilingual RAG Architecture](./multilingual-rag-architecture.md) - [Lexicon Service Guide](./lexicon-service-guide.md) - [Latency Budgets Guide](./latency-budgets-guide.md) - [Thinking Tone Settings](./thinking-tone-settings.md) - [Voice Pipeline Architecture](../VOICE_MODE_PIPELINE.md) 6:["slug","voice/voice-mode-v4-overview","c"] 0:["X7oMT3VrOffzp0qvbeOas",[[["",{"children":["docs",{"children":[["slug","voice/voice-mode-v4-overview","c"],{"children":["__PAGE__?{\"slug\":[\"voice\",\"voice-mode-v4-overview\"]}",{}]}]}]},"$undefined","$undefined",true],["",{"children":["docs",{"children":[["slug","voice/voice-mode-v4-overview","c"],{"children":["__PAGE__",{},[["$L1",["$","div",null,{"children":[["$","div",null,{"className":"mb-6 flex items-center justify-between gap-4","children":[["$","div",null,{"children":[["$","p",null,{"className":"text-sm text-gray-500 dark:text-gray-400","children":"Docs / Raw"}],["$","h1",null,{"className":"text-3xl font-bold text-gray-900 dark:text-white","children":"Voice Mode v4.1 Overview"}],["$","p",null,{"className":"text-sm text-gray-600 dark:text-gray-400","children":["Sourced from"," ",["$","code",null,{"className":"font-mono text-xs","children":["docs/","voice/voice-mode-v4-overview.md"]}]]}]]}],["$","a",null,{"href":"https://github.com/mohammednazmy/VoiceAssist/edit/main/docs/voice/voice-mode-v4-overview.md","target":"_blank","rel":"noreferrer","className":"inline-flex items-center gap-2 rounded-md border border-gray-200 dark:border-gray-700 px-3 py-1.5 text-sm text-gray-700 dark:text-gray-200 hover:border-primary-500 dark:hover:border-primary-400 hover:text-primary-700 dark:hover:text-primary-300","children":"Edit on GitHub"}]]}],["$","div",null,{"className":"rounded-lg border border-gray-200 dark:border-gray-800 bg-white dark:bg-gray-900 p-6","children":["$","$L2",null,{"content":"$3"}]}],["$","div",null,{"className":"mt-6 flex flex-wrap gap-2 text-sm","children":[["$","$L4",null,{"href":"/reference/all-docs","className":"inline-flex items-center gap-1 rounded-md bg-gray-100 px-3 py-1 text-gray-700 hover:bg-gray-200 dark:bg-gray-800 dark:text-gray-200 dark:hover:bg-gray-700","children":"← All documentation"}],["$","$L4",null,{"href":"/","className":"inline-flex items-center gap-1 rounded-md bg-gray-100 px-3 py-1 text-gray-700 hover:bg-gray-200 dark:bg-gray-800 dark:text-gray-200 dark:hover:bg-gray-700","children":"Home"}]]}]]}],null],null],null]},[null,["$","$L5",null,{"parallelRouterKey":"children","segmentPath":["children","docs","children","$6","children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L7",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":"$undefined","notFoundStyles":"$undefined"}]],null]},[null,["$","$L5",null,{"parallelRouterKey":"children","segmentPath":["children","docs","children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L7",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":"$undefined","notFoundStyles":"$undefined"}]],null]},[[[["$","link","0",{"rel":"stylesheet","href":"/_next/static/css/7f586cdbbaa33ff7.css","precedence":"next","crossOrigin":"$undefined"}]],["$","html",null,{"lang":"en","className":"h-full","children":["$","body",null,{"className":"__className_f367f3 h-full bg-white dark:bg-gray-900","children":[["$","a",null,{"href":"#main-content","className":"skip-to-content","children":"Skip to main content"}],["$","$L8",null,{"children":[["$","$L9",null,{}],["$","$La",null,{}],["$","main",null,{"id":"main-content","className":"lg:pl-64","role":"main","aria-label":"Documentation content","children":["$","$Lb",null,{"children":["$","$L5",null,{"parallelRouterKey":"children","segmentPath":["children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L7",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":[["$","title",null,{"children":"404: This page could not be found."}],["$","div",null,{"style":{"fontFamily":"system-ui,\"Segoe UI\",Roboto,Helvetica,Arial,sans-serif,\"Apple Color Emoji\",\"Segoe UI Emoji\"","height":"100vh","textAlign":"center","display":"flex","flexDirection":"column","alignItems":"center","justifyContent":"center"},"children":["$","div",null,{"children":[["$","style",null,{"dangerouslySetInnerHTML":{"__html":"body{color:#000;background:#fff;margin:0}.next-error-h1{border-right:1px solid rgba(0,0,0,.3)}@media (prefers-color-scheme:dark){body{color:#fff;background:#000}.next-error-h1{border-right:1px solid rgba(255,255,255,.3)}}"}}],["$","h1",null,{"className":"next-error-h1","style":{"display":"inline-block","margin":"0 20px 0 0","padding":"0 23px 0 0","fontSize":24,"fontWeight":500,"verticalAlign":"top","lineHeight":"49px"},"children":"404"}],["$","div",null,{"style":{"display":"inline-block"},"children":["$","h2",null,{"style":{"fontSize":14,"fontWeight":400,"lineHeight":"49px","margin":0},"children":"This page could not be found."}]}]]}]}]],"notFoundStyles":[]}]}]}]]}]]}]}]],null],null],["$Lc",null]]]] c:[["$","meta","0",{"name":"viewport","content":"width=device-width, initial-scale=1"}],["$","meta","1",{"charSet":"utf-8"}],["$","title","2",{"children":"Voice Mode v4.1 Overview | Docs | VoiceAssist Docs"}],["$","meta","3",{"name":"description","content":"Overview of Voice Mode Enhancement Plan v4.1 features"}],["$","meta","4",{"name":"keywords","content":"VoiceAssist,documentation,medical AI,voice assistant,healthcare,HIPAA,API"}],["$","meta","5",{"name":"robots","content":"index, follow"}],["$","meta","6",{"name":"googlebot","content":"index, follow"}],["$","link","7",{"rel":"canonical","href":"https://assistdocs.asimo.io"}],["$","meta","8",{"property":"og:title","content":"VoiceAssist Documentation"}],["$","meta","9",{"property":"og:description","content":"Comprehensive documentation for VoiceAssist - Enterprise Medical AI Assistant"}],["$","meta","10",{"property":"og:url","content":"https://assistdocs.asimo.io"}],["$","meta","11",{"property":"og:site_name","content":"VoiceAssist Docs"}],["$","meta","12",{"property":"og:type","content":"website"}],["$","meta","13",{"name":"twitter:card","content":"summary"}],["$","meta","14",{"name":"twitter:title","content":"VoiceAssist Documentation"}],["$","meta","15",{"name":"twitter:description","content":"Comprehensive documentation for VoiceAssist - Enterprise Medical AI Assistant"}],["$","meta","16",{"name":"next-size-adjust"}]] 1:null