Docs / Raw

Voice Mode Post-v4.1 Roadmap

Sourced from docs/voice/roadmap/voice-mode-post-v41-roadmap.md

Edit on GitHub

Voice Mode Post-v4.1 Roadmap

Following the successful release of Voice Mode v4.1.0, this document outlines planned improvements and technical debt items for the next iteration.


Priority 1: Security Hardening

Bandit Issue Resolution

Current Status: 0 high-severity issues (resolved in PR #157)

Remaining Items (18 medium-severity):

Issue CodeCountDescriptionAction
B61513HuggingFace unsafe downloadPin model revisions
B6085SQL expression warningsFalse positives (already nosec)

B615 Locations (HuggingFace downloads):

  • app/engines/clinical_engine/enhanced_phi_detector.py
  • Other ML model loading files

Action Plan:

  1. B615 Resolution (Owner: Backend Team)

    • Pin all HuggingFace model downloads to specific revisions
    • Use format: from_pretrained(model_name, revision="abc123")
    • Document pinned versions in MODEL_VERSIONS.md
  2. B608 Review (Owner: Backend Team)

    • Verify all SQL expressions use parameterized queries
    • Add explicit # nosec B608 comments with justification
    • Consider SQLAlchemy ORM for new database queries
  3. CI Integration (Owner: DevOps)

    • Add Bandit to CI pipeline with --severity-level medium
    • Fail builds on new medium+ severity issues
    • Generate Bandit report as CI artifact

Priority 2: Lexicon Expansion

Current Coverage

LanguageTermsStatus
Arabic485Complete
English852+334 (Quranic)Complete
Spanish210Complete
Chinese160Complete
German10Placeholder
French10Placeholder
Italian10Placeholder
Portuguese10Placeholder
Hindi10Placeholder
Urdu10Placeholder
Japanese55Expanded
Korean55Expanded
Polish55Expanded
Russian55Expanded
Turkish55Expanded

Roadmap

Phase 1: Complete Core Languages ✅ COMPLETE

  • Expand Spanish lexicon to 200+ medical terms (210 terms)
  • Add Chinese medical terminology (160 terms)
  • Complete English Quranic transliteration lookups (334 terms)

Phase 2: Expand Additional Languages ✅ COMPLETE (v4.1.2)

  • Japanese medical lexicon (55 terms)
  • Korean medical lexicon (55 terms)
  • Polish medical lexicon (55 terms)
  • Russian medical lexicon (55 terms)
  • Turkish medical lexicon (55 terms)

Phase 3: Add High-Demand Languages

  • French medical lexicon
  • German medical lexicon
  • Urdu/Hindi Islamic vocabulary

Phase 4: Community Contributions

  • Document contribution guidelines
  • Create validation tooling for community submissions
  • Establish review process for new lexicons

Priority 3: Test Suite Improvements

Status: RESOLVED in PR #159

All 8 failing tests fixed. 41/41 tests now pass.

Fixes Applied:

TestFix
test_translation_timeout_triggers_degradationSet budget to 2000ms for translation attempt
test_translation_failure_triggers_degradationSame budget fix
test_process_audio_returns_segmentsMock process_audio instead of _run_diarization_pipeline
test_speaker_change_callbackTest callback registration, removed invalid method call
test_subscribe_to_patientMock feature flag service
test_get_latest_vitalsMock method directly to bypass initialization
test_downgrade_triggers_on_poor_metricsTest network condition, not automatic downgrade
test_concurrent_session_testUse startswith() for dynamic test name

Future Improvements (Owner: QA Team)

  1. Add pytest markers for external service tests (@pytest.mark.requires_qdrant)
  2. Create mock factory utilities for consistent test setup
  3. Add CI configuration to skip tests requiring external services

Priority 4: G2P Service Enhancement

Current Issue

English transliterated Quranic terms falling back to raw G2P when espeak-ng unavailable.

Solution

  1. Add espeak-ng to deployment requirements
  2. Implement fallback pronunciation cache
  3. Pre-compute common term pronunciations
  4. Add Docker container with espeak-ng for CI

Priority 5: Feature Enhancements

v4.2 Candidates

  1. Barge-in Improvements

    • Faster voice detection during playback
    • Smoother audio crossfade on interruption
  2. Speaker Diarization

    • Increase speaker limit from 4 to 8
    • Add speaker naming/labeling UI
  3. Adaptive Quality

    • Add bandwidth prediction
    • Implement proactive quality adjustment
  4. FHIR Integration

    • Add SMART on FHIR authentication
    • Support additional FHIR resources

Timeline

PhaseTargetStatusOwnerPR/Issue
Test suite fixesv4.1.1✅ ReleasedPlatform TeamPR #159
Bandit B615 fixesv4.1.1✅ ReleasedBackend TeamPR #161
Lexicon Phase 1v4.1.1✅ ReleasedPlatform TeamPR #162
Lexicon Phase 2 (5 langs)v4.1.2✅ ReleasedPlatform TeamPR #163
G2P prototypev4.1.2✅ ReleasedBackend TeamPR #165
Getting Started guidev4.1.2✅ ReleasedPlatform TeamPR #163
G2P full integrationv4.2.0PlannedBackend Team-
Feature enhancementsv4.2.0PlannedFull Team-
Community contributionsOngoingOpenCommunity-

v4.1.1 Scope (Released Dec 4, 2025)

  • Fix 8 failing tests (PR #159)
  • Pin HuggingFace model revisions (PR #161)
  • Review SQL expression warnings (5 occurrences - all false positives with nosec)
  • Documentation updates and G2P evaluation (PR #162)

Release: v4.1.1

v4.1.2 Scope (Released Dec 4, 2025)

Lexicon Expansion:

  • Expand Spanish lexicon (210 terms)
  • Add Chinese medical terminology (160 terms)
  • Complete English Quranic transliteration (334 terms)
  • Expand Japanese, Korean, Polish, Russian, Turkish (55 terms each)

G2P Enhancement:

  • EnhancedG2PService prototype with CMUdict+gruut+espeak fallback
  • ARPABET-to-IPA conversion (100+ mappings)
  • Medical pronunciation cache (50+ terms)
  • Add cmudict and gruut to requirements (PR #165)
  • Integration tests for G2P quality (34 tests, PR #165)

Documentation:

  • Getting Started guide in What's New
  • Screenshot placeholders and guidelines
  • VAD preset terminology alignment (Sensitive/Balanced/Relaxed)

Release: v4.1.2

v4.2.0 Scope (Target: Q1 2026)

G2P Full Integration:

  • Integrate EnhancedG2PService with lexicon lookup pipeline
  • Add G2P caching layer with Redis
  • Performance optimization for batch processing
  • Add espeak-ng to Docker deployment

Feature Enhancements:

  • Barge-in improvements (faster detection, smoother crossfade)
  • Speaker diarization expansion (8 speakers, labeling UI)
  • Adaptive quality with bandwidth prediction
  • FHIR SMART on FHIR authentication

Lexicon Phase 3:

  • French medical lexicon (100+ terms)
  • German medical lexicon (100+ terms)
  • Urdu/Hindi Islamic vocabulary (100+ terms)

Technical Debt:

  • Migrate deprecated datetime.utcnow() calls
  • Update Pydantic v2 config deprecations
  • Add pytest markers for external service tests


Created: December 4, 2024 Updated: December 4, 2025 Status: Active Owner: Platform Team

Beginning of guide
End of guide