Voice Mode v4.1.0 Released
We're excited to announce the release of Voice Mode v4.1.0, a major update that brings voice-first interaction, advanced speech processing, and comprehensive Arabic/Quranic language support.
Highlights
Voice-First Input Bar
Redesigned interface optimized for speech interaction with a prominent tap-to-speak microphone button and real-time audio level visualization.
Streaming Text Display
Responses now stream in real-time with smooth character-by-character rendering, Markdown support, and RTL handling for Arabic text.
Speaker Diarization
Multi-speaker detection supporting up to 4 concurrent speakers with real-time speaker change detection and cross-session re-identification.
FHIR R4 Streaming
Healthcare data integration with enterprise-grade resilience including exponential backoff retry, circuit breaker pattern, and connection pooling.
Adaptive Audio Quality
Dynamic bitrate adjustment (16-128kbps) based on network conditions with hysteresis to prevent quality oscillation.
Quranic Lexicon
662 pronunciation terms including all 114 Surah names, 50+ Tajweed terminology, and 200+ Islamic vocabulary in both Arabic and English transliteration.
Performance
| Operation | Target | Achieved |
|---|---|---|
| Voice input to transcription | <500ms | 320ms |
| Speaker change detection | <200ms | 180ms |
| Text streaming first token | <300ms | 250ms |
Security
- Bandit scan: 0 high-severity issues
- HuggingFace models pinned to specific revisions
- Subprocess calls hardened with shell=False
Documentation
- What's New: https://assistdocs.asimo.io/docs/voice/whats-new-v4-1
- Post-v4.1 Roadmap: https://assistdocs.asimo.io/docs/voice/roadmap/voice-mode-post-v41-roadmap
What's Next
- v4.1.1: Test suite fixes, medium Bandit issues
- v4.1.2: Lexicon expansion, G2P improvements
- v4.2.0: Feature enhancements (barge-in, diarization limits)
Release Tag: v4.1.0 Release Date: December 4, 2024 PRs Included: #155, #156, #157, #158