2:I[7012,["4765","static/chunks/4765-f5afdf8061f456f3.js","9856","static/chunks/9856-3b185291364d9bef.js","6687","static/chunks/app/docs/%5B...slug%5D/page-e07536548216bee4.js"],"MarkdownRenderer"]
4:I[9856,["4765","static/chunks/4765-f5afdf8061f456f3.js","9856","static/chunks/9856-3b185291364d9bef.js","6687","static/chunks/app/docs/%5B...slug%5D/page-e07536548216bee4.js"],""]
5:I[4126,[],""]
7:I[9630,[],""]
8:I[4278,["9856","static/chunks/9856-3b185291364d9bef.js","8172","static/chunks/8172-b3a2d6fe4ae10d40.js","3185","static/chunks/app/layout-2814fa5d15b84fe4.js"],"HeadingProvider"]
9:I[1476,["9856","static/chunks/9856-3b185291364d9bef.js","8172","static/chunks/8172-b3a2d6fe4ae10d40.js","3185","static/chunks/app/layout-2814fa5d15b84fe4.js"],"Header"]
a:I[3167,["9856","static/chunks/9856-3b185291364d9bef.js","8172","static/chunks/8172-b3a2d6fe4ae10d40.js","3185","static/chunks/app/layout-2814fa5d15b84fe4.js"],"Sidebar"]
b:I[7409,["9856","static/chunks/9856-3b185291364d9bef.js","8172","static/chunks/8172-b3a2d6fe4ae10d40.js","3185","static/chunks/app/layout-2814fa5d15b84fe4.js"],"PageFrame"]
3:T2e91,
# Voice Mode v4.1 Overview

Voice Mode v4.1 introduces significant enhancements to the VoiceAssist voice pipeline, focusing on multilingual support, performance safeguards, and improved user feedback.

## Key Features

### 1. Multilingual RAG with Translation Fallback

The multilingual RAG service enables voice interactions in multiple languages:

- **Translate-then-retrieve pattern**: Non-English queries are translated to English for RAG retrieval, then responses are translated back
- **Multi-provider translation**: Primary provider (Google Translate) with automatic DeepL fallback
- **Code-switching detection**: Handles bilingual speakers mixing languages
- **Graceful degradation**: Falls back to original query when translation fails

See [Multilingual RAG Architecture](./multilingual-rag-architecture.md) for technical details.

### 2. Medical Pronunciation Lexicons

Language-specific pronunciation lexicons for accurate TTS:

- **15 languages supported**: EN, ES, FR, DE, IT, PT, AR, ZH, HI, UR, JA, KO, RU, PL, TR
- **Medical terminology**: Drug names, conditions, procedures, anatomy
- **G2P fallback**: espeak-ng for terms not in lexicons
- **Shared drug names**: 100 common medications with IPA pronunciations

See [Lexicon Service Guide](./lexicon-service-guide.md) for usage.

### 3. Latency-Aware Orchestration

Performance safeguards to maintain sub-700ms end-to-end latency:

| Stage              | Budget    | Degradation Action       |
| ------------------ | --------- | ------------------------ |
| Audio capture      | 50ms      | Log warning              |
| STT                | 200ms     | Use cached partial       |
| Language detection | 50ms      | Default to user language |
| Translation        | 200ms     | Skip translation         |
| RAG retrieval      | 300ms     | Return top-1 only        |
| LLM first token    | 300ms     | Shorten context          |
| TTS first chunk    | 150ms     | Use cached greeting      |
| **Total E2E**      | **700ms** | **Feature degradation**  |

See [Latency Budgets Guide](./latency-budgets-guide.md) for implementation.

### 4. Thinking Feedback UX

Multi-modal feedback during AI processing:

- **Audio tones**: Gentle beep, soft chime, subtle tick (Web Audio API)
- **Visual indicators**: Animated dots, pulsing circle, spinner, progress bar
- **Haptic feedback**: Gentle pulse, rhythmic patterns (mobile)
- **User controls**: Volume, style selection, enable/disable per modality

See [Thinking Tone Settings](./thinking-tone-settings.md) for configuration.

## Feature Flags

All v4.1 features are gated behind feature flags for safe rollout. Flags are grouped by workstream for independent deployment.

### Workstream 1: Translation & Multilingual RAG

| Flag                                    | Description                             | Default | Docs                                                   |
| --------------------------------------- | --------------------------------------- | ------- | ------------------------------------------------------ |
| `backend.voice_v4_translation_fallback` | Multi-provider translation with caching | Off     | [Multilingual RAG](./multilingual-rag-architecture.md) |
| `backend.voice_v4_multilingual_rag`     | Translate-then-retrieve pipeline        | Off     | [Multilingual RAG](./multilingual-rag-architecture.md) |

### Workstream 2: Audio & Speech Processing

| Flag                               | Description                      | Default | Docs                                          |
| ---------------------------------- | -------------------------------- | ------- | --------------------------------------------- |
| `backend.voice_v4_lexicon_service` | Medical pronunciation lexicons   | Off     | [Lexicon Service](./lexicon-service-guide.md) |
| `backend.voice_v4_phi_routing`     | PHI-aware STT with local Whisper | Off     | [PHI-Aware STT](./phi-aware-stt-routing.md)   |
| `backend.voice_v4_adaptive_vad`    | User-tunable VAD presets         | Off     | [Adaptive VAD](./adaptive-vad-presets.md)     |

### Workstream 3: Performance & Orchestration

| Flag                               | Description                  | Default | Docs                                          |
| ---------------------------------- | ---------------------------- | ------- | --------------------------------------------- |
| `backend.voice_v4_latency_budgets` | Latency-aware orchestration  | Off     | [Latency Budgets](./latency-budgets-guide.md) |
| `backend.voice_v4_thinking_tones`  | Backend thinking tone events | Off     | [Thinking Tones](./thinking-tone-settings.md) |

### Workstream 4: Internationalization

| Flag                           | Description                      | Default | Docs                                  |
| ------------------------------ | -------------------------------- | ------- | ------------------------------------- |
| `backend.voice_v4_rtl_support` | RTL text rendering (Arabic/Urdu) | Off     | [RTL Support](./rtl-support-guide.md) |

### Workstream 5: UI Enhancements

| Flag                                  | Description                          | Default | Docs                                                |
| ------------------------------------- | ------------------------------------ | ------- | --------------------------------------------------- |
| `ui.voice_v4_voice_first_ui`          | Voice-first unified input bar        | Off     | [Voice First Input Bar](./voice-first-input-bar.md) |
| `ui.voice_v4_streaming_text`          | Streaming text display during TTS    | Off     | [Streaming Text](./streaming-text-display.md)       |
| `ui.voice_v4_latency_indicator`       | Latency status with degradation info | Off     | [Latency Budgets](./latency-budgets-guide.md)       |
| `ui.voice_v4_thinking_feedback_panel` | Audio/visual/haptic feedback         | Off     | [Thinking Feedback](./thinking-tone-settings.md)    |

### Flag Dependencies

```
voice_v4_multilingual_rag
    └── voice_v4_translation_fallback (required)

voice_v4_thinking_feedback_panel (UI)
    └── voice_v4_thinking_tones (backend)

voice_v4_rtl_support
    └── voice_v4_multilingual_rag (recommended)
```

## Environment Variables

### Core Configuration

```bash
# Lexicon data directory (optional, auto-detected if not set)
VOICEASSIST_DATA_DIR=/path/to/voiceassist/data

# Translation providers (encrypted in production)
GOOGLE_TRANSLATE_API_KEY=your-key
DEEPL_API_KEY=your-key

# Feature flag defaults
VOICE_V4_ENABLED=false  # Master toggle

# Latency budget overrides (milliseconds)
VOICE_LATENCY_BUDGET_TOTAL=700
VOICE_LATENCY_BUDGET_STT=200
VOICE_LATENCY_BUDGET_TRANSLATION=200
VOICE_LATENCY_BUDGET_RAG=300
```

### VOICEASSIST_DATA_DIR Configuration

The `VOICEASSIST_DATA_DIR` environment variable specifies the root directory for lexicon files and other data assets.

**Setting the variable:**

```bash
# Linux/macOS - add to ~/.bashrc or systemd service
export VOICEASSIST_DATA_DIR=/opt/voiceassist/data

# Docker - add to docker-compose.yml
environment:
  - VOICEASSIST_DATA_DIR=/app/data

# Kubernetes - add to deployment.yaml
env:
  - name: VOICEASSIST_DATA_DIR
    value: /app/data

# CI/CD - set in pipeline configuration
env:
  VOICEASSIST_DATA_DIR: ${{ github.workspace }}/data
```

**Expected directory structure:**

```
$VOICEASSIST_DATA_DIR/
├── lexicons/
│   ├── shared/
│   │   └── drug_names.json
│   ├── en/
│   │   └── medical_phonemes.json
│   ├── es/
│   │   └── medical_phonemes.json
│   └── ... (other languages)
└── models/
    └── whisper/  (for local PHI-aware STT)
```

### Data Directory Resolution

The `_resolve_data_dir()` function in `lexicon_service.py` automatically locates the lexicon data directory using this priority:

1. **Environment variable**: If `VOICEASSIST_DATA_DIR` is set and the path exists, use it
2. **Repository root**: Walk up from the service file to find `data/lexicons/`
3. **Current working directory**: Check `./data/` relative to cwd
4. **Fallback**: Use relative path from the service file

This allows the lexicon service to work in development, CI, and production without manual configuration.

## Error Propagation

Voice Mode v4.1 implements structured error propagation to ensure graceful degradation without silent failures.

### Translation Errors

When translation fails, the system raises `TranslationFailedError` and applies graceful degradation:

```python
from app.services.latency_aware_orchestrator import (
    TranslationFailedError,
    DegradationType
)

try:
    result = await orchestrator.process_with_budgets(
        audio_data=audio_bytes,
        user_language="es"
    )
except TranslationFailedError as e:
    # Degradation already applied: uses original query
    logger.warning(f"Translation failed: {e}")
    # DegradationType.TRANSLATION_FAILED is in result.degradation_applied
```

### Error → Degradation Mapping

| Error                      | Degradation Type                        | Fallback Behavior                    |
| -------------------------- | --------------------------------------- | ------------------------------------ |
| `TranslationFailedError`   | `TRANSLATION_FAILED`                    | Use original query, inform user      |
| Translation timeout        | `TRANSLATION_BUDGET_EXCEEDED`           | Skip translation, use original       |
| Language detection timeout | `LANGUAGE_DETECTION_SKIPPED`            | Default to user's preferred language |
| RAG retrieval slow         | `RAG_LIMITED_TO_1` / `RAG_LIMITED_TO_3` | Return fewer results                 |
| LLM context overflow       | `LLM_CONTEXT_SHORTENED`                 | Truncate context history             |
| TTS cold start             | `TTS_USED_CACHED_GREETING`              | Play cached audio while warming up   |

### UI Error Surfacing

Degradation events are propagated to the frontend via WebSocket:

```typescript
// Frontend receives degradation events
socket.on("voice:degradation", (event: DegradationEvent) => {
  if (event.type === "TRANSLATION_FAILED") {
    showToast("Translation unavailable, using original language", "warning");
  }
});

// LatencyIndicator component shows degradation tooltips
<LatencyIndicator
  latencyMs={result.total_latency_ms}
  degradations={result.degradation_applied}
  showDetails={true}
/>;
```

### Logging and Monitoring

All errors emit structured logs and Prometheus metrics:

```python
# Structured logging
logger.warning("Translation failed", extra={
    "error_type": "TranslationFailedError",
    "source_language": "es",
    "degradation": "TRANSLATION_FAILED",
    "fallback": "original_query"
})

# Prometheus metrics
voice_degradation_total.labels(type="translation_failed").inc()
voice_error_total.labels(stage="translation", error="timeout").inc()
```

## Migration Guide

### Upgrading from v3

1. **Feature flags**: All v4 features are disabled by default
2. **Lexicons**: Automatically loaded from `data/lexicons/` directory
3. **Translation**: Enable `voice_v4_translation_fallback` flag
4. **UI components**: Import from `@/components/voice/`

### Testing

```bash
# Run v4 service tests
cd services/api-gateway
pytest tests/services/test_voice_v4_services.py -v

# Validate lexicons
python -c "
from app.services.lexicon_service import get_lexicon_service
import asyncio
service = get_lexicon_service()
reports = asyncio.run(service.validate_all_lexicons())
for lang, report in reports.items():
    print(f'{lang}: {report.term_count} terms ({report.status})')
"
```

## Related Documentation

- [Multilingual RAG Architecture](./multilingual-rag-architecture.md)
- [Lexicon Service Guide](./lexicon-service-guide.md)
- [Latency Budgets Guide](./latency-budgets-guide.md)
- [Thinking Tone Settings](./thinking-tone-settings.md)
- [Voice Pipeline Architecture](../VOICE_MODE_PIPELINE.md)
6:["slug","voice/voice-mode-v4-overview","c"]
0:["X7oMT3VrOffzp0qvbeOas",[[["",{"children":["docs",{"children":[["slug","voice/voice-mode-v4-overview","c"],{"children":["__PAGE__?{\"slug\":[\"voice\",\"voice-mode-v4-overview\"]}",{}]}]}]},"$undefined","$undefined",true],["",{"children":["docs",{"children":[["slug","voice/voice-mode-v4-overview","c"],{"children":["__PAGE__",{},[["$L1",["$","div",null,{"children":[["$","div",null,{"className":"mb-6 flex items-center justify-between gap-4","children":[["$","div",null,{"children":[["$","p",null,{"className":"text-sm text-gray-500 dark:text-gray-400","children":"Docs / Raw"}],["$","h1",null,{"className":"text-3xl font-bold text-gray-900 dark:text-white","children":"Voice Mode v4.1 Overview"}],["$","p",null,{"className":"text-sm text-gray-600 dark:text-gray-400","children":["Sourced from"," ",["$","code",null,{"className":"font-mono text-xs","children":["docs/","voice/voice-mode-v4-overview.md"]}]]}]]}],["$","a",null,{"href":"https://github.com/mohammednazmy/VoiceAssist/edit/main/docs/voice/voice-mode-v4-overview.md","target":"_blank","rel":"noreferrer","className":"inline-flex items-center gap-2 rounded-md border border-gray-200 dark:border-gray-700 px-3 py-1.5 text-sm text-gray-700 dark:text-gray-200 hover:border-primary-500 dark:hover:border-primary-400 hover:text-primary-700 dark:hover:text-primary-300","children":"Edit on GitHub"}]]}],["$","div",null,{"className":"rounded-lg border border-gray-200 dark:border-gray-800 bg-white dark:bg-gray-900 p-6","children":["$","$L2",null,{"content":"$3"}]}],["$","div",null,{"className":"mt-6 flex flex-wrap gap-2 text-sm","children":[["$","$L4",null,{"href":"/reference/all-docs","className":"inline-flex items-center gap-1 rounded-md bg-gray-100 px-3 py-1 text-gray-700 hover:bg-gray-200 dark:bg-gray-800 dark:text-gray-200 dark:hover:bg-gray-700","children":"← All documentation"}],["$","$L4",null,{"href":"/","className":"inline-flex items-center gap-1 rounded-md bg-gray-100 px-3 py-1 text-gray-700 hover:bg-gray-200 dark:bg-gray-800 dark:text-gray-200 dark:hover:bg-gray-700","children":"Home"}]]}]]}],null],null],null]},[null,["$","$L5",null,{"parallelRouterKey":"children","segmentPath":["children","docs","children","$6","children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L7",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":"$undefined","notFoundStyles":"$undefined"}]],null]},[null,["$","$L5",null,{"parallelRouterKey":"children","segmentPath":["children","docs","children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L7",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":"$undefined","notFoundStyles":"$undefined"}]],null]},[[[["$","link","0",{"rel":"stylesheet","href":"/_next/static/css/7f586cdbbaa33ff7.css","precedence":"next","crossOrigin":"$undefined"}]],["$","html",null,{"lang":"en","className":"h-full","children":["$","body",null,{"className":"__className_f367f3 h-full bg-white dark:bg-gray-900","children":[["$","a",null,{"href":"#main-content","className":"skip-to-content","children":"Skip to main content"}],["$","$L8",null,{"children":[["$","$L9",null,{}],["$","$La",null,{}],["$","main",null,{"id":"main-content","className":"lg:pl-64","role":"main","aria-label":"Documentation content","children":["$","$Lb",null,{"children":["$","$L5",null,{"parallelRouterKey":"children","segmentPath":["children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L7",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":[["$","title",null,{"children":"404: This page could not be found."}],["$","div",null,{"style":{"fontFamily":"system-ui,\"Segoe UI\",Roboto,Helvetica,Arial,sans-serif,\"Apple Color Emoji\",\"Segoe UI Emoji\"","height":"100vh","textAlign":"center","display":"flex","flexDirection":"column","alignItems":"center","justifyContent":"center"},"children":["$","div",null,{"children":[["$","style",null,{"dangerouslySetInnerHTML":{"__html":"body{color:#000;background:#fff;margin:0}.next-error-h1{border-right:1px solid rgba(0,0,0,.3)}@media (prefers-color-scheme:dark){body{color:#fff;background:#000}.next-error-h1{border-right:1px solid rgba(255,255,255,.3)}}"}}],["$","h1",null,{"className":"next-error-h1","style":{"display":"inline-block","margin":"0 20px 0 0","padding":"0 23px 0 0","fontSize":24,"fontWeight":500,"verticalAlign":"top","lineHeight":"49px"},"children":"404"}],["$","div",null,{"style":{"display":"inline-block"},"children":["$","h2",null,{"style":{"fontSize":14,"fontWeight":400,"lineHeight":"49px","margin":0},"children":"This page could not be found."}]}]]}]}]],"notFoundStyles":[]}]}]}]]}]]}]}]],null],null],["$Lc",null]]]]
c:[["$","meta","0",{"name":"viewport","content":"width=device-width, initial-scale=1"}],["$","meta","1",{"charSet":"utf-8"}],["$","title","2",{"children":"Voice Mode v4.1 Overview | Docs | VoiceAssist Docs"}],["$","meta","3",{"name":"description","content":"Overview of Voice Mode Enhancement Plan v4.1 features"}],["$","meta","4",{"name":"keywords","content":"VoiceAssist,documentation,medical AI,voice assistant,healthcare,HIPAA,API"}],["$","meta","5",{"name":"robots","content":"index, follow"}],["$","meta","6",{"name":"googlebot","content":"index, follow"}],["$","link","7",{"rel":"canonical","href":"https://assistdocs.asimo.io"}],["$","meta","8",{"property":"og:title","content":"VoiceAssist Documentation"}],["$","meta","9",{"property":"og:description","content":"Comprehensive documentation for VoiceAssist - Enterprise Medical AI Assistant"}],["$","meta","10",{"property":"og:url","content":"https://assistdocs.asimo.io"}],["$","meta","11",{"property":"og:site_name","content":"VoiceAssist Docs"}],["$","meta","12",{"property":"og:type","content":"website"}],["$","meta","13",{"name":"twitter:card","content":"summary"}],["$","meta","14",{"name":"twitter:title","content":"VoiceAssist Documentation"}],["$","meta","15",{"name":"twitter:description","content":"Comprehensive documentation for VoiceAssist - Enterprise Medical AI Assistant"}],["$","meta","16",{"name":"next-size-adjust"}]]
1:null