G2P Service Alternatives Evaluation
This document evaluates alternative G2P (Grapheme-to-Phoneme) systems for the VoiceAssist platform, addressing the current limitations with espeak-ng availability and pronunciation quality.
Current Implementation
Location: services/api-gateway/app/services/lexicon_service.py
Architecture
Term → Lexicon Lookup → Shared Drugs → G2P → Fallback
↓
espeak-ng (most languages)
pypinyin (Chinese)
mishkal (Arabic - placeholder)
Issues
- espeak-ng dependency: Not always installed, causing fallback to raw term
- Quality: espeak-ng pronunciations can be robotic/non-native
- Medical terminology: Poor handling of medical/pharmaceutical terms
- Arabic support: mishkal integration incomplete
Alternative G2P Libraries
1. phonemizer
Repository: https://github.com/bootphon/phonemizer License: GPL-3.0
Pros:
- Wraps espeak-ng, Festival, and other backends
- Clean Python API
- Supports batch processing
- Multiple output formats (IPA, SAMPA)
Cons:
- Still requires espeak-ng as backend
- GPL license may cause issues
- No improvement over raw espeak-ng quality
Installation:
pip install phonemizer apt-get install espeak-ng # Still required
Code Example:
from phonemizer import phonemize text = "diabetes mellitus" phonemes = phonemize(text, language='en-us', backend='espeak') # Output: "daɪəbiːtiːz mɛlɪtəs"
Verdict: ❌ Not recommended - same espeak dependency
2. g2p_en
Repository: https://github.com/Kyubyong/g2p License: Apache-2.0
Pros:
- Neural network-based G2P for English
- No espeak dependency
- Pure Python
- Good quality for common words
Cons:
- English only
- No medical vocabulary training
- Slower than rule-based
Installation:
pip install g2p_en
Code Example:
from g2p_en import G2p g2p = G2p() phonemes = g2p("metformin") # Output: ['M', 'EH1', 'T', 'F', 'AO0', 'R', 'M', 'IH0', 'N']
Verdict: ⚠️ Possible for English fallback
3. gruut
Repository: https://github.com/rhasspy/gruut License: MIT
Pros:
- Multi-language support (20+ languages)
- Designed for TTS pipelines
- MIT license
- Pure Python, no system dependencies
- Includes lexicon support
Cons:
- Less accurate than neural models
- No Arabic support
- Requires language-specific data files
Installation:
pip install gruut[en,es,de,fr]
Code Example:
from gruut import sentences for sent in sentences("Metformin is used for diabetes", lang="en-us"): for word in sent: print(word.text, word.phonemes)
Verdict: ✅ Recommended for evaluation
4. cmudict (CMU Pronouncing Dictionary)
Repository: Part of NLTK / standalone License: BSD
Pros:
- 130,000+ English pronunciations
- Fast dictionary lookup
- No ML overhead
- Widely used standard
Cons:
- English only
- No OOV handling (needs G2P fallback)
- No medical terminology
Installation:
pip install cmudict # OR via NLTK python -c "import nltk; nltk.download('cmudict')"
Verdict: ✅ Recommended as primary English lookup
5. epitran
Repository: https://github.com/dmort27/epitran License: MIT
Pros:
- Rule-based IPA transcription
- Supports 100+ languages
- No ML overhead
- Consistent output
Cons:
- Rule-based = less accurate
- No context awareness
- Limited phonetic detail
Verdict: ⚠️ Useful for placeholder languages
Recommendation
Proposed Architecture
Term → Lexicon Lookup → CMUdict → gruut → espeak-ng fallback
↓
(Medical lexicons) (English) (Multi-lang) (Last resort)
Implementation Plan
Phase 1: Add CMUdict for English
class EnhancedG2PService: def __init__(self): self.cmudict = None self._load_cmudict() def _load_cmudict(self): try: import cmudict self.cmudict = cmudict.dict() except ImportError: logger.warning("cmudict not available") async def generate_english(self, term: str) -> str: # 1. Try CMUdict if self.cmudict and term.lower() in self.cmudict: return self._arpabet_to_ipa(self.cmudict[term.lower()][0]) # 2. Try gruut # 3. Fall back to espeak-ng
Phase 2: Add gruut for Multi-language
async def generate(self, term: str, language: str) -> str: if language == "en": return await self.generate_english(term) # Try gruut for supported languages if language in self.GRUUT_LANGUAGES: return await self._generate_gruut(term, language) # Fall back to espeak-ng return await self._generate_espeak(term, language)
Phase 3: Pre-compute Common Pronunciations
Build a pronunciation cache for common medical terms:
PRECOMPUTED_PRONUNCIATIONS = { "metformin": "mɛtˈfɔrmɪn", "lisinopril": "laɪˈsɪnəprɪl", # ... 1000+ medical terms }
Migration Path
- v4.1.2: Add CMUdict for English, optional gruut
- v4.2.0: Full gruut integration, deprecate espeak-ng primary
- v4.3.0: Neural G2P for medical terms
Dependencies
Current
# Already in requirements
pypinyin>=0.48.0 # Chinese pinyin
Proposed Additions
cmudict>=1.0.12 # CMU Pronouncing Dictionary
gruut>=2.3.0 # Multi-language G2P
Optional (system)
# Still useful as fallback apt-get install espeak-ng
Testing Strategy
Unit Tests
class TestG2PService: async def test_english_cmudict_lookup(self): g2p = EnhancedG2PService() result = await g2p.generate("hello", "en") assert "h" in result.lower() assert g2p.last_source == "cmudict" async def test_fallback_chain(self): g2p = EnhancedG2PService() # Unknown word should fall through to gruut/espeak result = await g2p.generate("xyzabc123", "en") assert result # Should return something, not error
Integration Tests
async def test_medical_term_quality(self): """Compare G2P output against reference pronunciations.""" g2p = EnhancedG2PService() reference = load_reference_pronunciations() for term, expected_ipa in reference.items(): result = await g2p.generate(term, "en") similarity = phonetic_similarity(result, expected_ipa) assert similarity > 0.8, f"Poor pronunciation for {term}"
Decision
Recommended approach: Implement CMUdict + gruut hybrid
Rationale:
- CMUdict provides high-quality English pronunciations
- gruut adds multi-language support without system dependencies
- espeak-ng remains as fallback for unsupported languages
- No GPL licensing issues (all MIT/BSD/Apache)
Next Steps:
Addcmudictandgruutto requirementsImplement✅ DONEEnhancedG2PServiceclassAdd ARPABET-to-IPA conversion utility✅ DONEBuild pronunciation cache for medical terms✅ DONE (50+ terms)- Update lexicon service to use new G2P
Implementation Status
Prototype: services/api-gateway/app/services/enhanced_g2p_service.py
Features Implemented:
- ARPABET-to-IPA conversion (100+ phoneme mappings)
- Medical pronunciation cache (50+ common terms)
- CMUdict lookup for English (~130k words)
- gruut integration for 11 languages
- espeak-ng fallback for all other languages
- Runtime caching (up to 10k entries)
- Batch generation support
Fallback Chain:
- Medical pronunciation cache (confidence: 0.95)
- CMUdict lookup for English (confidence: 0.90)
- gruut for supported languages (confidence: 0.80)
- espeak-ng for all languages (confidence: 0.70)
- Raw term fallback (confidence: 0.30)
Created: December 4, 2024 Updated: December 4, 2025 Author: Platform Team Status: Prototype Complete