Voice Mode v4.1.2 Release Notes
Version: 4.1.2 (Feature Release) Date: December 2025 Type: G2P enhancement and lexicon expansion
Summary
Voice Mode v4.1.2 delivers enhanced grapheme-to-phoneme (G2P) conversion with a multi-source fallback chain and significantly expands multi-language lexicon support with 1,384 total pronunciation entries.
New Features
EnhancedG2PService
A new G2P service with intelligent fallback chain for accurate pronunciation generation:
Fallback Chain (in priority order):
-
Medical Pronunciation Cache (50+ terms, confidence: 0.95)
- Pre-computed IPA for common medical terms
- Includes drugs, conditions, and procedures
-
CMUdict (English, confidence: 0.9)
- Carnegie Mellon Pronouncing Dictionary
- 134,000+ English words with ARPABET phonemes
- Automatic ARPABET-to-IPA conversion
-
gruut (Multi-language, confidence: 0.8)
- Pure Python G2P for multiple languages
- Supports: English, German, French, Spanish, Russian, Polish
-
espeak-ng (Fallback, confidence: 0.7)
- System TTS fallback for unsupported terms
- Broad language coverage including Arabic, CJK
-
Raw Term Fallback (Last resort, confidence: 0.3)
- Returns term wrapped in slashes: /term/
- Ensures no silent failures
Key Features:
- Runtime caching for repeated lookups
- Batch generation for multiple terms
- Language-aware processing
- Comprehensive statistics API
Usage:
from app.services.enhanced_g2p_service import EnhancedG2PService g2p = EnhancedG2PService() result = await g2p.generate("metformin", "en") # G2PResult(term='metformin', phonemes='mɛtfɔːrmɪn', source='medical_cache', confidence=0.95)
ARPABET-to-IPA Conversion
100+ phoneme mappings for converting CMUdict ARPABET to IPA:
| ARPABET | IPA | Example |
|---|---|---|
| AA | ɑ | father |
| AE | æ | cat |
| IY | i | beet |
| SH | ʃ | ship |
| TH | θ | think |
| AA1 | ˈɑ | (primary stress) |
| AA2 | ˌɑ | (secondary stress) |
Lexicon Expansion
Total coverage increased to 1,384 pronunciation entries:
| Language | Terms | Status | Notes |
|---|---|---|---|
| Arabic | 485 | Complete | Quranic vocabulary |
| English | 852+334 | Complete | General + Quranic transliteration |
| Spanish | 210 | Complete | Medical terminology |
| Chinese | 160 | Complete | Medical terminology |
| Japanese | 55 | Expanded | Medical + common terms |
| Korean | 55 | Expanded | Medical + common terms |
| Polish | 55 | Expanded | Medical + common terms |
| Russian | 55 | Expanded | Medical + common terms |
| Turkish | 55 | Expanded | Medical + common terms |
Documentation Updates
Getting Started Guide
Added comprehensive Getting Started section to What's New v4.1:
- Voice-First Input Bar usage
- VAD preset selection guide
- Quality badge and PHI indicator explanations
- Thinking feedback configuration
Screenshot Requirements
Created docs/voice/screenshots/README.md with:
- Capture status table for 5 required screenshots
- Annotation guidelines and tools
- Resolution and format requirements
VAD Preset Terminology
Aligned terminology across all documentation:
| Old Term | New Term |
|---|---|
| Quiet | Sensitive |
| Normal | Balanced |
| Noisy | Relaxed |
Bug Fixes
- Turkish lexicon typo: Fixed "zatürree" → "zatürre" (pneumonia)
- ElevenLabs test: Updated model count assertion for new Flash/Turbo v2.5 models
Test Results
======================== 869 passed, 34 skipped ========================
New Tests Added:
test_enhanced_g2p_service.py: 34 integration tests- Medical cache tests (4)
- CMUdict tests (2)
- Multi-language tests (5)
- Fallback chain tests (7)
- Caching tests (2)
- Batch generation tests (2)
- ARPABET conversion tests (2)
- Edge case tests (5)
- Statistics tests (1)
- Result dataclass tests (4)
Dependencies
New dependencies added to requirements.txt:
cmudict>=1.0.12 # CMU Pronouncing Dictionary for English
gruut>=2.4.0 # Multi-language G2P with pure Python implementation
Installation
cd services/api-gateway pip install -r requirements.txt
Upgrade Notes
This release is fully backward compatible with v4.1.1. No configuration changes required.
Optional Enhancements:
- Install
espeak-ngsystem package for broader language fallback support - Configure medical pronunciation cache for domain-specific terms
Related Links
Released: December 4, 2025 Commit: 7047d4d PR: #165