Docs / Raw

Voice Mode v4.1.0 Release Announcement

Sourced from docs/releases/v4.1.0-release-announcement.md

Edit on GitHub

Voice Mode v4.1.0 Released

We're excited to announce the release of Voice Mode v4.1.0, a major update that brings voice-first interaction, advanced speech processing, and comprehensive Arabic/Quranic language support.

Highlights

Voice-First Input Bar

Redesigned interface optimized for speech interaction with a prominent tap-to-speak microphone button and real-time audio level visualization.

Streaming Text Display

Responses now stream in real-time with smooth character-by-character rendering, Markdown support, and RTL handling for Arabic text.

Speaker Diarization

Multi-speaker detection supporting up to 4 concurrent speakers with real-time speaker change detection and cross-session re-identification.

FHIR R4 Streaming

Healthcare data integration with enterprise-grade resilience including exponential backoff retry, circuit breaker pattern, and connection pooling.

Adaptive Audio Quality

Dynamic bitrate adjustment (16-128kbps) based on network conditions with hysteresis to prevent quality oscillation.

Quranic Lexicon

662 pronunciation terms including all 114 Surah names, 50+ Tajweed terminology, and 200+ Islamic vocabulary in both Arabic and English transliteration.

Performance

OperationTargetAchieved
Voice input to transcription<500ms320ms
Speaker change detection<200ms180ms
Text streaming first token<300ms250ms

Security

  • Bandit scan: 0 high-severity issues
  • HuggingFace models pinned to specific revisions
  • Subprocess calls hardened with shell=False

Documentation

What's Next

  • v4.1.1: Test suite fixes, medium Bandit issues
  • v4.1.2: Lexicon expansion, G2P improvements
  • v4.2.0: Feature enhancements (barge-in, diarization limits)

Release Tag: v4.1.0 Release Date: December 4, 2024 PRs Included: #155, #156, #157, #158

Beginning of guide
End of guide