Docs / Raw

Voice Mode v4.1.2 Release Announcement

Sourced from docs/releases/v4.1.2-release-announcement.md

Edit on GitHub

Voice Mode v4.1.2 Release Notes

Version: 4.1.2 (Feature Release) Date: December 2025 Type: G2P enhancement and lexicon expansion


Summary

Voice Mode v4.1.2 delivers enhanced grapheme-to-phoneme (G2P) conversion with a multi-source fallback chain and significantly expands multi-language lexicon support with 1,384 total pronunciation entries.


New Features

EnhancedG2PService

A new G2P service with intelligent fallback chain for accurate pronunciation generation:

Fallback Chain (in priority order):

  1. Medical Pronunciation Cache (50+ terms, confidence: 0.95)

    • Pre-computed IPA for common medical terms
    • Includes drugs, conditions, and procedures
  2. CMUdict (English, confidence: 0.9)

    • Carnegie Mellon Pronouncing Dictionary
    • 134,000+ English words with ARPABET phonemes
    • Automatic ARPABET-to-IPA conversion
  3. gruut (Multi-language, confidence: 0.8)

    • Pure Python G2P for multiple languages
    • Supports: English, German, French, Spanish, Russian, Polish
  4. espeak-ng (Fallback, confidence: 0.7)

    • System TTS fallback for unsupported terms
    • Broad language coverage including Arabic, CJK
  5. Raw Term Fallback (Last resort, confidence: 0.3)

    • Returns term wrapped in slashes: /term/
    • Ensures no silent failures

Key Features:

  • Runtime caching for repeated lookups
  • Batch generation for multiple terms
  • Language-aware processing
  • Comprehensive statistics API

Usage:

from app.services.enhanced_g2p_service import EnhancedG2PService g2p = EnhancedG2PService() result = await g2p.generate("metformin", "en") # G2PResult(term='metformin', phonemes='mɛtfɔːrmɪn', source='medical_cache', confidence=0.95)

ARPABET-to-IPA Conversion

100+ phoneme mappings for converting CMUdict ARPABET to IPA:

ARPABETIPAExample
AAɑfather
AEæcat
IYibeet
SHʃship
THθthink
AA1ˈɑ(primary stress)
AA2ˌɑ(secondary stress)

Lexicon Expansion

Total coverage increased to 1,384 pronunciation entries:

LanguageTermsStatusNotes
Arabic485CompleteQuranic vocabulary
English852+334CompleteGeneral + Quranic transliteration
Spanish210CompleteMedical terminology
Chinese160CompleteMedical terminology
Japanese55ExpandedMedical + common terms
Korean55ExpandedMedical + common terms
Polish55ExpandedMedical + common terms
Russian55ExpandedMedical + common terms
Turkish55ExpandedMedical + common terms

Documentation Updates

Getting Started Guide

Added comprehensive Getting Started section to What's New v4.1:

  • Voice-First Input Bar usage
  • VAD preset selection guide
  • Quality badge and PHI indicator explanations
  • Thinking feedback configuration

Screenshot Requirements

Created docs/voice/screenshots/README.md with:

  • Capture status table for 5 required screenshots
  • Annotation guidelines and tools
  • Resolution and format requirements

VAD Preset Terminology

Aligned terminology across all documentation:

Old TermNew Term
QuietSensitive
NormalBalanced
NoisyRelaxed

Bug Fixes

  • Turkish lexicon typo: Fixed "zatürree" → "zatürre" (pneumonia)
  • ElevenLabs test: Updated model count assertion for new Flash/Turbo v2.5 models

Test Results

======================== 869 passed, 34 skipped ========================

New Tests Added:

  • test_enhanced_g2p_service.py: 34 integration tests
    • Medical cache tests (4)
    • CMUdict tests (2)
    • Multi-language tests (5)
    • Fallback chain tests (7)
    • Caching tests (2)
    • Batch generation tests (2)
    • ARPABET conversion tests (2)
    • Edge case tests (5)
    • Statistics tests (1)
    • Result dataclass tests (4)

Dependencies

New dependencies added to requirements.txt:

cmudict>=1.0.12  # CMU Pronouncing Dictionary for English
gruut>=2.4.0     # Multi-language G2P with pure Python implementation

Installation

cd services/api-gateway pip install -r requirements.txt

Upgrade Notes

This release is fully backward compatible with v4.1.1. No configuration changes required.

Optional Enhancements:

  • Install espeak-ng system package for broader language fallback support
  • Configure medical pronunciation cache for domain-specific terms


Released: December 4, 2025 Commit: 7047d4d PR: #165

Beginning of guide
End of guide