Voice Mode v4.1.0 Released

We're excited to announce the release of Voice Mode v4.1.0, a major update that brings voice-first interaction, advanced speech processing, and comprehensive Arabic/Quranic language support.

Highlights

Voice-First Input Bar

Redesigned interface optimized for speech interaction with a prominent tap-to-speak microphone button and real-time audio level visualization.

Streaming Text Display

Responses now stream in real-time with smooth character-by-character rendering, Markdown support, and RTL handling for Arabic text.

Speaker Diarization

Multi-speaker detection supporting up to 4 concurrent speakers with real-time speaker change detection and cross-session re-identification.

FHIR R4 Streaming

Healthcare data integration with enterprise-grade resilience including exponential backoff retry, circuit breaker pattern, and connection pooling.

Adaptive Audio Quality

Dynamic bitrate adjustment (16-128kbps) based on network conditions with hysteresis to prevent quality oscillation.

Quranic Lexicon

662 pronunciation terms including all 114 Surah names, 50+ Tajweed terminology, and 200+ Islamic vocabulary in both Arabic and English transliteration.

Performance

Operation	Target	Achieved
Voice input to transcription	<500ms	320ms
Speaker change detection	<200ms	180ms
Text streaming first token	<300ms	250ms

Security

Bandit scan: 0 high-severity issues
HuggingFace models pinned to specific revisions
Subprocess calls hardened with shell=False

Documentation

What's New: https://assistdocs.asimo.io/docs/voice/whats-new-v4-1
Post-v4.1 Roadmap: https://assistdocs.asimo.io/docs/voice/roadmap/voice-mode-post-v41-roadmap

What's Next

v4.1.1: Test suite fixes, medium Bandit issues
v4.1.2: Lexicon expansion, G2P improvements
v4.2.0: Feature enhancements (barge-in, diarization limits)

Release Tag: v4.1.0 Release Date: December 4, 2024 PRs Included: #155, #156, #157, #158

Voice Mode v4.1.0 Release Announcement