Voice Mode v4.1 Overview
Voice Mode v4.1 introduces significant enhancements to the VoiceAssist voice pipeline, focusing on multilingual support, performance safeguards, and improved user feedback.
Key Features
1. Multilingual RAG with Translation Fallback
The multilingual RAG service enables voice interactions in multiple languages:
- Translate-then-retrieve pattern: Non-English queries are translated to English for RAG retrieval, then responses are translated back
- Multi-provider translation: Primary provider (Google Translate) with automatic DeepL fallback
- Code-switching detection: Handles bilingual speakers mixing languages
- Graceful degradation: Falls back to original query when translation fails
See Multilingual RAG Architecture for technical details.
2. Medical Pronunciation Lexicons
Language-specific pronunciation lexicons for accurate TTS:
- 15 languages supported: EN, ES, FR, DE, IT, PT, AR, ZH, HI, UR, JA, KO, RU, PL, TR
- Medical terminology: Drug names, conditions, procedures, anatomy
- G2P fallback: espeak-ng for terms not in lexicons
- Shared drug names: 100 common medications with IPA pronunciations
See Lexicon Service Guide for usage.
3. Latency-Aware Orchestration
Performance safeguards to maintain sub-700ms end-to-end latency:
| Stage | Budget | Degradation Action |
|---|---|---|
| Audio capture | 50ms | Log warning |
| STT | 200ms | Use cached partial |
| Language detection | 50ms | Default to user language |
| Translation | 200ms | Skip translation |
| RAG retrieval | 300ms | Return top-1 only |
| LLM first token | 300ms | Shorten context |
| TTS first chunk | 150ms | Use cached greeting |
| Total E2E | 700ms | Feature degradation |
See Latency Budgets Guide for implementation.
4. Thinking Feedback UX
Multi-modal feedback during AI processing:
- Audio tones: Gentle beep, soft chime, subtle tick (Web Audio API)
- Visual indicators: Animated dots, pulsing circle, spinner, progress bar
- Haptic feedback: Gentle pulse, rhythmic patterns (mobile)
- User controls: Volume, style selection, enable/disable per modality
See Thinking Tone Settings for configuration.
Feature Flags
All v4.1 features are gated behind feature flags for safe rollout. Flags are grouped by workstream for independent deployment.
Workstream 1: Translation & Multilingual RAG
| Flag | Description | Default | Docs |
|---|---|---|---|
backend.voice_v4_translation_fallback | Multi-provider translation with caching | Off | Multilingual RAG |
backend.voice_v4_multilingual_rag | Translate-then-retrieve pipeline | Off | Multilingual RAG |
Workstream 2: Audio & Speech Processing
| Flag | Description | Default | Docs |
|---|---|---|---|
backend.voice_v4_lexicon_service | Medical pronunciation lexicons | Off | Lexicon Service |
backend.voice_v4_phi_routing | PHI-aware STT with local Whisper | Off | PHI-Aware STT |
backend.voice_v4_adaptive_vad | User-tunable VAD presets | Off | Adaptive VAD |
Workstream 3: Performance & Orchestration
| Flag | Description | Default | Docs |
|---|---|---|---|
backend.voice_v4_latency_budgets | Latency-aware orchestration | Off | Latency Budgets |
backend.voice_v4_thinking_tones | Backend thinking tone events | Off | Thinking Tones |
Workstream 4: Internationalization
| Flag | Description | Default | Docs |
|---|---|---|---|
backend.voice_v4_rtl_support | RTL text rendering (Arabic/Urdu) | Off | RTL Support |
Workstream 5: UI Enhancements
| Flag | Description | Default | Docs |
|---|---|---|---|
ui.voice_v4_voice_first_ui | Voice-first unified input bar | Off | Voice First Input Bar |
ui.voice_v4_streaming_text | Streaming text display during TTS | Off | Streaming Text |
ui.voice_v4_latency_indicator | Latency status with degradation info | Off | Latency Budgets |
ui.voice_v4_thinking_feedback_panel | Audio/visual/haptic feedback | Off | Thinking Feedback |
Flag Dependencies
voice_v4_multilingual_rag
└── voice_v4_translation_fallback (required)
voice_v4_thinking_feedback_panel (UI)
└── voice_v4_thinking_tones (backend)
voice_v4_rtl_support
└── voice_v4_multilingual_rag (recommended)
Environment Variables
Core Configuration
# Lexicon data directory (optional, auto-detected if not set) VOICEASSIST_DATA_DIR=/path/to/voiceassist/data # Translation providers (encrypted in production) GOOGLE_TRANSLATE_API_KEY=your-key DEEPL_API_KEY=your-key # Feature flag defaults VOICE_V4_ENABLED=false # Master toggle # Latency budget overrides (milliseconds) VOICE_LATENCY_BUDGET_TOTAL=700 VOICE_LATENCY_BUDGET_STT=200 VOICE_LATENCY_BUDGET_TRANSLATION=200 VOICE_LATENCY_BUDGET_RAG=300
VOICEASSIST_DATA_DIR Configuration
The VOICEASSIST_DATA_DIR environment variable specifies the root directory for lexicon files and other data assets.
Setting the variable:
# Linux/macOS - add to ~/.bashrc or systemd service export VOICEASSIST_DATA_DIR=/opt/voiceassist/data # Docker - add to docker-compose.yml environment: - VOICEASSIST_DATA_DIR=/app/data # Kubernetes - add to deployment.yaml env: - name: VOICEASSIST_DATA_DIR value: /app/data # CI/CD - set in pipeline configuration env: VOICEASSIST_DATA_DIR: ${{ github.workspace }}/data
Expected directory structure:
$VOICEASSIST_DATA_DIR/
├── lexicons/
│ ├── shared/
│ │ └── drug_names.json
│ ├── en/
│ │ └── medical_phonemes.json
│ ├── es/
│ │ └── medical_phonemes.json
│ └── ... (other languages)
└── models/
└── whisper/ (for local PHI-aware STT)
Data Directory Resolution
The _resolve_data_dir() function in lexicon_service.py automatically locates the lexicon data directory using this priority:
- Environment variable: If
VOICEASSIST_DATA_DIRis set and the path exists, use it - Repository root: Walk up from the service file to find
data/lexicons/ - Current working directory: Check
./data/relative to cwd - Fallback: Use relative path from the service file
This allows the lexicon service to work in development, CI, and production without manual configuration.
Error Propagation
Voice Mode v4.1 implements structured error propagation to ensure graceful degradation without silent failures.
Translation Errors
When translation fails, the system raises TranslationFailedError and applies graceful degradation:
from app.services.latency_aware_orchestrator import ( TranslationFailedError, DegradationType ) try: result = await orchestrator.process_with_budgets( audio_data=audio_bytes, user_language="es" ) except TranslationFailedError as e: # Degradation already applied: uses original query logger.warning(f"Translation failed: {e}") # DegradationType.TRANSLATION_FAILED is in result.degradation_applied
Error → Degradation Mapping
| Error | Degradation Type | Fallback Behavior |
|---|---|---|
TranslationFailedError | TRANSLATION_FAILED | Use original query, inform user |
| Translation timeout | TRANSLATION_BUDGET_EXCEEDED | Skip translation, use original |
| Language detection timeout | LANGUAGE_DETECTION_SKIPPED | Default to user's preferred language |
| RAG retrieval slow | RAG_LIMITED_TO_1 / RAG_LIMITED_TO_3 | Return fewer results |
| LLM context overflow | LLM_CONTEXT_SHORTENED | Truncate context history |
| TTS cold start | TTS_USED_CACHED_GREETING | Play cached audio while warming up |
UI Error Surfacing
Degradation events are propagated to the frontend via WebSocket:
// Frontend receives degradation events socket.on("voice:degradation", (event: DegradationEvent) => { if (event.type === "TRANSLATION_FAILED") { showToast("Translation unavailable, using original language", "warning"); } }); // LatencyIndicator component shows degradation tooltips <LatencyIndicator latencyMs={result.total_latency_ms} degradations={result.degradation_applied} showDetails={true} />;
Logging and Monitoring
All errors emit structured logs and Prometheus metrics:
# Structured logging logger.warning("Translation failed", extra={ "error_type": "TranslationFailedError", "source_language": "es", "degradation": "TRANSLATION_FAILED", "fallback": "original_query" }) # Prometheus metrics voice_degradation_total.labels(type="translation_failed").inc() voice_error_total.labels(stage="translation", error="timeout").inc()
Migration Guide
Upgrading from v3
- Feature flags: All v4 features are disabled by default
- Lexicons: Automatically loaded from
data/lexicons/directory - Translation: Enable
voice_v4_translation_fallbackflag - UI components: Import from
@/components/voice/
Testing
# Run v4 service tests cd services/api-gateway pytest tests/services/test_voice_v4_services.py -v # Validate lexicons python -c " from app.services.lexicon_service import get_lexicon_service import asyncio service = get_lexicon_service() reports = asyncio.run(service.validate_all_lexicons()) for lang, report in reports.items(): print(f'{lang}: {report.term_count} terms ({report.status})') "