Docs / Raw

Voice Mode v4.1 Overview

Sourced from docs/voice/voice-mode-v4-overview.md

Edit on GitHub

Voice Mode v4.1 Overview

Voice Mode v4.1 introduces significant enhancements to the VoiceAssist voice pipeline, focusing on multilingual support, performance safeguards, and improved user feedback.

Key Features

1. Multilingual RAG with Translation Fallback

The multilingual RAG service enables voice interactions in multiple languages:

  • Translate-then-retrieve pattern: Non-English queries are translated to English for RAG retrieval, then responses are translated back
  • Multi-provider translation: Primary provider (Google Translate) with automatic DeepL fallback
  • Code-switching detection: Handles bilingual speakers mixing languages
  • Graceful degradation: Falls back to original query when translation fails

See Multilingual RAG Architecture for technical details.

2. Medical Pronunciation Lexicons

Language-specific pronunciation lexicons for accurate TTS:

  • 15 languages supported: EN, ES, FR, DE, IT, PT, AR, ZH, HI, UR, JA, KO, RU, PL, TR
  • Medical terminology: Drug names, conditions, procedures, anatomy
  • G2P fallback: espeak-ng for terms not in lexicons
  • Shared drug names: 100 common medications with IPA pronunciations

See Lexicon Service Guide for usage.

3. Latency-Aware Orchestration

Performance safeguards to maintain sub-700ms end-to-end latency:

StageBudgetDegradation Action
Audio capture50msLog warning
STT200msUse cached partial
Language detection50msDefault to user language
Translation200msSkip translation
RAG retrieval300msReturn top-1 only
LLM first token300msShorten context
TTS first chunk150msUse cached greeting
Total E2E700msFeature degradation

See Latency Budgets Guide for implementation.

4. Thinking Feedback UX

Multi-modal feedback during AI processing:

  • Audio tones: Gentle beep, soft chime, subtle tick (Web Audio API)
  • Visual indicators: Animated dots, pulsing circle, spinner, progress bar
  • Haptic feedback: Gentle pulse, rhythmic patterns (mobile)
  • User controls: Volume, style selection, enable/disable per modality

See Thinking Tone Settings for configuration.

Feature Flags

All v4.1 features are gated behind feature flags for safe rollout. Flags are grouped by workstream for independent deployment.

Workstream 1: Translation & Multilingual RAG

FlagDescriptionDefaultDocs
backend.voice_v4_translation_fallbackMulti-provider translation with cachingOffMultilingual RAG
backend.voice_v4_multilingual_ragTranslate-then-retrieve pipelineOffMultilingual RAG

Workstream 2: Audio & Speech Processing

FlagDescriptionDefaultDocs
backend.voice_v4_lexicon_serviceMedical pronunciation lexiconsOffLexicon Service
backend.voice_v4_phi_routingPHI-aware STT with local WhisperOffPHI-Aware STT
backend.voice_v4_adaptive_vadUser-tunable VAD presetsOffAdaptive VAD

Workstream 3: Performance & Orchestration

FlagDescriptionDefaultDocs
backend.voice_v4_latency_budgetsLatency-aware orchestrationOffLatency Budgets
backend.voice_v4_thinking_tonesBackend thinking tone eventsOffThinking Tones

Workstream 4: Internationalization

FlagDescriptionDefaultDocs
backend.voice_v4_rtl_supportRTL text rendering (Arabic/Urdu)OffRTL Support

Workstream 5: UI Enhancements

FlagDescriptionDefaultDocs
ui.voice_v4_voice_first_uiVoice-first unified input barOffVoice First Input Bar
ui.voice_v4_streaming_textStreaming text display during TTSOffStreaming Text
ui.voice_v4_latency_indicatorLatency status with degradation infoOffLatency Budgets
ui.voice_v4_thinking_feedback_panelAudio/visual/haptic feedbackOffThinking Feedback

Flag Dependencies

voice_v4_multilingual_rag
    └── voice_v4_translation_fallback (required)

voice_v4_thinking_feedback_panel (UI)
    └── voice_v4_thinking_tones (backend)

voice_v4_rtl_support
    └── voice_v4_multilingual_rag (recommended)

Environment Variables

Core Configuration

# Lexicon data directory (optional, auto-detected if not set) VOICEASSIST_DATA_DIR=/path/to/voiceassist/data # Translation providers (encrypted in production) GOOGLE_TRANSLATE_API_KEY=your-key DEEPL_API_KEY=your-key # Feature flag defaults VOICE_V4_ENABLED=false # Master toggle # Latency budget overrides (milliseconds) VOICE_LATENCY_BUDGET_TOTAL=700 VOICE_LATENCY_BUDGET_STT=200 VOICE_LATENCY_BUDGET_TRANSLATION=200 VOICE_LATENCY_BUDGET_RAG=300

VOICEASSIST_DATA_DIR Configuration

The VOICEASSIST_DATA_DIR environment variable specifies the root directory for lexicon files and other data assets.

Setting the variable:

# Linux/macOS - add to ~/.bashrc or systemd service export VOICEASSIST_DATA_DIR=/opt/voiceassist/data # Docker - add to docker-compose.yml environment: - VOICEASSIST_DATA_DIR=/app/data # Kubernetes - add to deployment.yaml env: - name: VOICEASSIST_DATA_DIR value: /app/data # CI/CD - set in pipeline configuration env: VOICEASSIST_DATA_DIR: ${{ github.workspace }}/data

Expected directory structure:

$VOICEASSIST_DATA_DIR/
├── lexicons/
│   ├── shared/
│   │   └── drug_names.json
│   ├── en/
│   │   └── medical_phonemes.json
│   ├── es/
│   │   └── medical_phonemes.json
│   └── ... (other languages)
└── models/
    └── whisper/  (for local PHI-aware STT)

Data Directory Resolution

The _resolve_data_dir() function in lexicon_service.py automatically locates the lexicon data directory using this priority:

  1. Environment variable: If VOICEASSIST_DATA_DIR is set and the path exists, use it
  2. Repository root: Walk up from the service file to find data/lexicons/
  3. Current working directory: Check ./data/ relative to cwd
  4. Fallback: Use relative path from the service file

This allows the lexicon service to work in development, CI, and production without manual configuration.

Error Propagation

Voice Mode v4.1 implements structured error propagation to ensure graceful degradation without silent failures.

Translation Errors

When translation fails, the system raises TranslationFailedError and applies graceful degradation:

from app.services.latency_aware_orchestrator import ( TranslationFailedError, DegradationType ) try: result = await orchestrator.process_with_budgets( audio_data=audio_bytes, user_language="es" ) except TranslationFailedError as e: # Degradation already applied: uses original query logger.warning(f"Translation failed: {e}") # DegradationType.TRANSLATION_FAILED is in result.degradation_applied

Error → Degradation Mapping

ErrorDegradation TypeFallback Behavior
TranslationFailedErrorTRANSLATION_FAILEDUse original query, inform user
Translation timeoutTRANSLATION_BUDGET_EXCEEDEDSkip translation, use original
Language detection timeoutLANGUAGE_DETECTION_SKIPPEDDefault to user's preferred language
RAG retrieval slowRAG_LIMITED_TO_1 / RAG_LIMITED_TO_3Return fewer results
LLM context overflowLLM_CONTEXT_SHORTENEDTruncate context history
TTS cold startTTS_USED_CACHED_GREETINGPlay cached audio while warming up

UI Error Surfacing

Degradation events are propagated to the frontend via WebSocket:

// Frontend receives degradation events socket.on("voice:degradation", (event: DegradationEvent) => { if (event.type === "TRANSLATION_FAILED") { showToast("Translation unavailable, using original language", "warning"); } }); // LatencyIndicator component shows degradation tooltips <LatencyIndicator latencyMs={result.total_latency_ms} degradations={result.degradation_applied} showDetails={true} />;

Logging and Monitoring

All errors emit structured logs and Prometheus metrics:

# Structured logging logger.warning("Translation failed", extra={ "error_type": "TranslationFailedError", "source_language": "es", "degradation": "TRANSLATION_FAILED", "fallback": "original_query" }) # Prometheus metrics voice_degradation_total.labels(type="translation_failed").inc() voice_error_total.labels(stage="translation", error="timeout").inc()

Migration Guide

Upgrading from v3

  1. Feature flags: All v4 features are disabled by default
  2. Lexicons: Automatically loaded from data/lexicons/ directory
  3. Translation: Enable voice_v4_translation_fallback flag
  4. UI components: Import from @/components/voice/

Testing

# Run v4 service tests cd services/api-gateway pytest tests/services/test_voice_v4_services.py -v # Validate lexicons python -c " from app.services.lexicon_service import get_lexicon_service import asyncio service = get_lexicon_service() reports = asyncio.run(service.validate_all_lexicons()) for lang, report in reports.items(): print(f'{lang}: {report.term_count} terms ({report.status})') "
Beginning of guide
End of guide