2:I[7012,["4765","static/chunks/4765-f5afdf8061f456f3.js","9856","static/chunks/9856-3b185291364d9bef.js","6687","static/chunks/app/docs/%5B...slug%5D/page-e07536548216bee4.js"],"MarkdownRenderer"]
4:I[9856,["4765","static/chunks/4765-f5afdf8061f456f3.js","9856","static/chunks/9856-3b185291364d9bef.js","6687","static/chunks/app/docs/%5B...slug%5D/page-e07536548216bee4.js"],""]
5:I[4126,[],""]
7:I[9630,[],""]
8:I[4278,["9856","static/chunks/9856-3b185291364d9bef.js","8172","static/chunks/8172-b3a2d6fe4ae10d40.js","3185","static/chunks/app/layout-2814fa5d15b84fe4.js"],"HeadingProvider"]
9:I[1476,["9856","static/chunks/9856-3b185291364d9bef.js","8172","static/chunks/8172-b3a2d6fe4ae10d40.js","3185","static/chunks/app/layout-2814fa5d15b84fe4.js"],"Header"]
a:I[3167,["9856","static/chunks/9856-3b185291364d9bef.js","8172","static/chunks/8172-b3a2d6fe4ae10d40.js","3185","static/chunks/app/layout-2814fa5d15b84fe4.js"],"Sidebar"]
b:I[7409,["9856","static/chunks/9856-3b185291364d9bef.js","8172","static/chunks/8172-b3a2d6fe4ae10d40.js","3185","static/chunks/app/layout-2814fa5d15b84fe4.js"],"PageFrame"]
3:T1f84,
# G2P Service Alternatives Evaluation

This document evaluates alternative G2P (Grapheme-to-Phoneme) systems for the VoiceAssist platform, addressing the current limitations with espeak-ng availability and pronunciation quality.

## Current Implementation

**Location:** `services/api-gateway/app/services/lexicon_service.py`

### Architecture

```
Term → Lexicon Lookup → Shared Drugs → G2P → Fallback
                         ↓
        espeak-ng (most languages)
        pypinyin (Chinese)
        mishkal (Arabic - placeholder)
```

### Issues

1. **espeak-ng dependency**: Not always installed, causing fallback to raw term
2. **Quality**: espeak-ng pronunciations can be robotic/non-native
3. **Medical terminology**: Poor handling of medical/pharmaceutical terms
4. **Arabic support**: mishkal integration incomplete

---

## Alternative G2P Libraries

### 1. phonemizer

**Repository:** https://github.com/bootphon/phonemizer
**License:** GPL-3.0

**Pros:**

- Wraps espeak-ng, Festival, and other backends
- Clean Python API
- Supports batch processing
- Multiple output formats (IPA, SAMPA)

**Cons:**

- Still requires espeak-ng as backend
- GPL license may cause issues
- No improvement over raw espeak-ng quality

**Installation:**

```bash
pip install phonemizer
apt-get install espeak-ng  # Still required
```

**Code Example:**

```python
from phonemizer import phonemize
text = "diabetes mellitus"
phonemes = phonemize(text, language='en-us', backend='espeak')
# Output: "daɪəbiːtiːz mɛlɪtəs"
```

**Verdict:** ❌ Not recommended - same espeak dependency

---

### 2. g2p_en

**Repository:** https://github.com/Kyubyong/g2p
**License:** Apache-2.0

**Pros:**

- Neural network-based G2P for English
- No espeak dependency
- Pure Python
- Good quality for common words

**Cons:**

- English only
- No medical vocabulary training
- Slower than rule-based

**Installation:**

```bash
pip install g2p_en
```

**Code Example:**

```python
from g2p_en import G2p
g2p = G2p()
phonemes = g2p("metformin")
# Output: ['M', 'EH1', 'T', 'F', 'AO0', 'R', 'M', 'IH0', 'N']
```

**Verdict:** ⚠️ Possible for English fallback

---

### 3. gruut

**Repository:** https://github.com/rhasspy/gruut
**License:** MIT

**Pros:**

- Multi-language support (20+ languages)
- Designed for TTS pipelines
- MIT license
- Pure Python, no system dependencies
- Includes lexicon support

**Cons:**

- Less accurate than neural models
- No Arabic support
- Requires language-specific data files

**Installation:**

```bash
pip install gruut[en,es,de,fr]
```

**Code Example:**

```python
from gruut import sentences
for sent in sentences("Metformin is used for diabetes", lang="en-us"):
    for word in sent:
        print(word.text, word.phonemes)
```

**Verdict:** ✅ Recommended for evaluation

---

### 4. cmudict (CMU Pronouncing Dictionary)

**Repository:** Part of NLTK / standalone
**License:** BSD

**Pros:**

- 130,000+ English pronunciations
- Fast dictionary lookup
- No ML overhead
- Widely used standard

**Cons:**

- English only
- No OOV handling (needs G2P fallback)
- No medical terminology

**Installation:**

```bash
pip install cmudict
# OR via NLTK
python -c "import nltk; nltk.download('cmudict')"
```

**Verdict:** ✅ Recommended as primary English lookup

---

### 5. epitran

**Repository:** https://github.com/dmort27/epitran
**License:** MIT

**Pros:**

- Rule-based IPA transcription
- Supports 100+ languages
- No ML overhead
- Consistent output

**Cons:**

- Rule-based = less accurate
- No context awareness
- Limited phonetic detail

**Verdict:** ⚠️ Useful for placeholder languages

---

## Recommendation

### Proposed Architecture

```
Term → Lexicon Lookup → CMUdict → gruut → espeak-ng fallback
         ↓
  (Medical lexicons)   (English)  (Multi-lang)  (Last resort)
```

### Implementation Plan

#### Phase 1: Add CMUdict for English

```python
class EnhancedG2PService:
    def __init__(self):
        self.cmudict = None
        self._load_cmudict()

    def _load_cmudict(self):
        try:
            import cmudict
            self.cmudict = cmudict.dict()
        except ImportError:
            logger.warning("cmudict not available")

    async def generate_english(self, term: str) -> str:
        # 1. Try CMUdict
        if self.cmudict and term.lower() in self.cmudict:
            return self._arpabet_to_ipa(self.cmudict[term.lower()][0])

        # 2. Try gruut
        # 3. Fall back to espeak-ng
```

#### Phase 2: Add gruut for Multi-language

```python
async def generate(self, term: str, language: str) -> str:
    if language == "en":
        return await self.generate_english(term)

    # Try gruut for supported languages
    if language in self.GRUUT_LANGUAGES:
        return await self._generate_gruut(term, language)

    # Fall back to espeak-ng
    return await self._generate_espeak(term, language)
```

#### Phase 3: Pre-compute Common Pronunciations

Build a pronunciation cache for common medical terms:

```python
PRECOMPUTED_PRONUNCIATIONS = {
    "metformin": "mɛtˈfɔrmɪn",
    "lisinopril": "laɪˈsɪnəprɪl",
    # ... 1000+ medical terms
}
```

---

## Migration Path

1. **v4.1.2**: Add CMUdict for English, optional gruut
2. **v4.2.0**: Full gruut integration, deprecate espeak-ng primary
3. **v4.3.0**: Neural G2P for medical terms

## Dependencies

### Current

```
# Already in requirements
pypinyin>=0.48.0  # Chinese pinyin
```

### Proposed Additions

```
cmudict>=1.0.12     # CMU Pronouncing Dictionary
gruut>=2.3.0        # Multi-language G2P
```

### Optional (system)

```bash
# Still useful as fallback
apt-get install espeak-ng
```

---

## Testing Strategy

### Unit Tests

```python
class TestG2PService:
    async def test_english_cmudict_lookup(self):
        g2p = EnhancedG2PService()
        result = await g2p.generate("hello", "en")
        assert "h" in result.lower()
        assert g2p.last_source == "cmudict"

    async def test_fallback_chain(self):
        g2p = EnhancedG2PService()
        # Unknown word should fall through to gruut/espeak
        result = await g2p.generate("xyzabc123", "en")
        assert result  # Should return something, not error
```

### Integration Tests

```python
async def test_medical_term_quality(self):
    """Compare G2P output against reference pronunciations."""
    g2p = EnhancedG2PService()
    reference = load_reference_pronunciations()

    for term, expected_ipa in reference.items():
        result = await g2p.generate(term, "en")
        similarity = phonetic_similarity(result, expected_ipa)
        assert similarity > 0.8, f"Poor pronunciation for {term}"
```

---

## Decision

**Recommended approach:** Implement CMUdict + gruut hybrid

**Rationale:**

- CMUdict provides high-quality English pronunciations
- gruut adds multi-language support without system dependencies
- espeak-ng remains as fallback for unsupported languages
- No GPL licensing issues (all MIT/BSD/Apache)

**Next Steps:**

1. ~~Add `cmudict` and `gruut` to requirements~~
2. ~~Implement `EnhancedG2PService` class~~ ✅ DONE
3. ~~Add ARPABET-to-IPA conversion utility~~ ✅ DONE
4. ~~Build pronunciation cache for medical terms~~ ✅ DONE (50+ terms)
5. Update lexicon service to use new G2P

## Implementation Status

**Prototype:** `services/api-gateway/app/services/enhanced_g2p_service.py`

**Features Implemented:**

- ARPABET-to-IPA conversion (100+ phoneme mappings)
- Medical pronunciation cache (50+ common terms)
- CMUdict lookup for English (~130k words)
- gruut integration for 11 languages
- espeak-ng fallback for all other languages
- Runtime caching (up to 10k entries)
- Batch generation support

**Fallback Chain:**

1. Medical pronunciation cache (confidence: 0.95)
2. CMUdict lookup for English (confidence: 0.90)
3. gruut for supported languages (confidence: 0.80)
4. espeak-ng for all languages (confidence: 0.70)
5. Raw term fallback (confidence: 0.30)

---

**Created:** December 4, 2024
**Updated:** December 4, 2025
**Author:** Platform Team
**Status:** Prototype Complete
6:["slug","voice/design/g2p-alternatives-evaluation","c"]
0:["X7oMT3VrOffzp0qvbeOas",[[["",{"children":["docs",{"children":[["slug","voice/design/g2p-alternatives-evaluation","c"],{"children":["__PAGE__?{\"slug\":[\"voice\",\"design\",\"g2p-alternatives-evaluation\"]}",{}]}]}]},"$undefined","$undefined",true],["",{"children":["docs",{"children":[["slug","voice/design/g2p-alternatives-evaluation","c"],{"children":["__PAGE__",{},[["$L1",["$","div",null,{"children":[["$","div",null,{"className":"mb-6 flex items-center justify-between gap-4","children":[["$","div",null,{"children":[["$","p",null,{"className":"text-sm text-gray-500 dark:text-gray-400","children":"Docs / Raw"}],["$","h1",null,{"className":"text-3xl font-bold text-gray-900 dark:text-white","children":"G2P Service Alternatives Evaluation"}],["$","p",null,{"className":"text-sm text-gray-600 dark:text-gray-400","children":["Sourced from"," ",["$","code",null,{"className":"font-mono text-xs","children":["docs/","voice/design/g2p-alternatives-evaluation.md"]}]]}]]}],["$","a",null,{"href":"https://github.com/mohammednazmy/VoiceAssist/edit/main/docs/voice/design/g2p-alternatives-evaluation.md","target":"_blank","rel":"noreferrer","className":"inline-flex items-center gap-2 rounded-md border border-gray-200 dark:border-gray-700 px-3 py-1.5 text-sm text-gray-700 dark:text-gray-200 hover:border-primary-500 dark:hover:border-primary-400 hover:text-primary-700 dark:hover:text-primary-300","children":"Edit on GitHub"}]]}],["$","div",null,{"className":"rounded-lg border border-gray-200 dark:border-gray-800 bg-white dark:bg-gray-900 p-6","children":["$","$L2",null,{"content":"$3"}]}],["$","div",null,{"className":"mt-6 flex flex-wrap gap-2 text-sm","children":[["$","$L4",null,{"href":"/reference/all-docs","className":"inline-flex items-center gap-1 rounded-md bg-gray-100 px-3 py-1 text-gray-700 hover:bg-gray-200 dark:bg-gray-800 dark:text-gray-200 dark:hover:bg-gray-700","children":"← All documentation"}],["$","$L4",null,{"href":"/","className":"inline-flex items-center gap-1 rounded-md bg-gray-100 px-3 py-1 text-gray-700 hover:bg-gray-200 dark:bg-gray-800 dark:text-gray-200 dark:hover:bg-gray-700","children":"Home"}]]}]]}],null],null],null]},[null,["$","$L5",null,{"parallelRouterKey":"children","segmentPath":["children","docs","children","$6","children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L7",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":"$undefined","notFoundStyles":"$undefined"}]],null]},[null,["$","$L5",null,{"parallelRouterKey":"children","segmentPath":["children","docs","children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L7",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":"$undefined","notFoundStyles":"$undefined"}]],null]},[[[["$","link","0",{"rel":"stylesheet","href":"/_next/static/css/7f586cdbbaa33ff7.css","precedence":"next","crossOrigin":"$undefined"}]],["$","html",null,{"lang":"en","className":"h-full","children":["$","body",null,{"className":"__className_f367f3 h-full bg-white dark:bg-gray-900","children":[["$","a",null,{"href":"#main-content","className":"skip-to-content","children":"Skip to main content"}],["$","$L8",null,{"children":[["$","$L9",null,{}],["$","$La",null,{}],["$","main",null,{"id":"main-content","className":"lg:pl-64","role":"main","aria-label":"Documentation content","children":["$","$Lb",null,{"children":["$","$L5",null,{"parallelRouterKey":"children","segmentPath":["children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L7",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":[["$","title",null,{"children":"404: This page could not be found."}],["$","div",null,{"style":{"fontFamily":"system-ui,\"Segoe UI\",Roboto,Helvetica,Arial,sans-serif,\"Apple Color Emoji\",\"Segoe UI Emoji\"","height":"100vh","textAlign":"center","display":"flex","flexDirection":"column","alignItems":"center","justifyContent":"center"},"children":["$","div",null,{"children":[["$","style",null,{"dangerouslySetInnerHTML":{"__html":"body{color:#000;background:#fff;margin:0}.next-error-h1{border-right:1px solid rgba(0,0,0,.3)}@media (prefers-color-scheme:dark){body{color:#fff;background:#000}.next-error-h1{border-right:1px solid rgba(255,255,255,.3)}}"}}],["$","h1",null,{"className":"next-error-h1","style":{"display":"inline-block","margin":"0 20px 0 0","padding":"0 23px 0 0","fontSize":24,"fontWeight":500,"verticalAlign":"top","lineHeight":"49px"},"children":"404"}],["$","div",null,{"style":{"display":"inline-block"},"children":["$","h2",null,{"style":{"fontSize":14,"fontWeight":400,"lineHeight":"49px","margin":0},"children":"This page could not be found."}]}]]}]}]],"notFoundStyles":[]}]}]}]]}]]}]}]],null],null],["$Lc",null]]]]
c:[["$","meta","0",{"name":"viewport","content":"width=device-width, initial-scale=1"}],["$","meta","1",{"charSet":"utf-8"}],["$","title","2",{"children":"G2P Service Alternatives Evaluation | Docs | VoiceAssist Docs"}],["$","meta","3",{"name":"description","content":"Evaluation of G2P (Grapheme-to-Phoneme) alternatives for v4.1.2"}],["$","meta","4",{"name":"keywords","content":"VoiceAssist,documentation,medical AI,voice assistant,healthcare,HIPAA,API"}],["$","meta","5",{"name":"robots","content":"index, follow"}],["$","meta","6",{"name":"googlebot","content":"index, follow"}],["$","link","7",{"rel":"canonical","href":"https://assistdocs.asimo.io"}],["$","meta","8",{"property":"og:title","content":"VoiceAssist Documentation"}],["$","meta","9",{"property":"og:description","content":"Comprehensive documentation for VoiceAssist - Enterprise Medical AI Assistant"}],["$","meta","10",{"property":"og:url","content":"https://assistdocs.asimo.io"}],["$","meta","11",{"property":"og:site_name","content":"VoiceAssist Docs"}],["$","meta","12",{"property":"og:type","content":"website"}],["$","meta","13",{"name":"twitter:card","content":"summary"}],["$","meta","14",{"name":"twitter:title","content":"VoiceAssist Documentation"}],["$","meta","15",{"name":"twitter:description","content":"Comprehensive documentation for VoiceAssist - Enterprise Medical AI Assistant"}],["$","meta","16",{"name":"next-size-adjust"}]]
1:null