What's New in Voice Mode v4.1
Voice Mode v4.1 is a major release that introduces a voice-first interface, advanced speech processing, healthcare integrations, and comprehensive Arabic/Quranic language support.
Release Highlights
- Voice-First Input Bar: Redesigned interface optimized for speech interaction
- Streaming Text Display: Real-time response rendering with smooth animations
- Speaker Diarization: Multi-speaker detection and attribution
- FHIR R4 Streaming: Healthcare data integration with retry resilience
- Adaptive Audio Quality: Dynamic bitrate adjustment based on network conditions
- Quranic Lexicon: 662 terms including all 114 Surah names and Tajweed terminology
Getting Started
Using the Voice-First Input Bar
- Tap the microphone button to start recording
- Speak naturally - the system detects when you stop speaking
- Wait for transcription - your speech appears as text
- View the response - streamed in real-time with Arabic support
Selecting a VAD Preset
Choose the preset that matches your environment:
| Environment | Recommended Preset |
|---|---|
| Quiet room | Sensitive |
| Home/Office | Balanced (default) |
| Public/Noisy | Relaxed |
To change: Settings (โ๏ธ) โ Voice Settings โ VAD Preset
Understanding the Indicators
Quality Badge (๐ถ):
- ๐ข High: Excellent connection (128kbps)
- ๐ก Medium: Acceptable quality (64kbps)
- ๐ Low: Degraded quality (32kbps)
- ๐ด Minimal: Very poor connection (16kbps)
PHI Indicator (๐ก๏ธ):
- ๐ข Green: No sensitive health info detected
- ๐ก Yellow: Potential PHI - review recommended
- ๐ด Red: PHI detected - secure handling active
Adjusting Thinking Feedback
Control what happens while the AI processes your request:
- Tone: Soft chime when processing starts/ends
- Haptic: Gentle vibration feedback (mobile)
- Visual: Pulsing indicator animation
To configure: Settings (โ๏ธ) โ Thinking Feedback
New Features
Voice-First Input Bar
The input interface has been completely redesigned to prioritize voice interaction:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ [๐ค Tap to Speak] โ [โจ๏ธ] โ [โ๏ธ] โ
โ โ
โ "Ask me anything about the Quran..." โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Key Features:
- Large, prominent microphone button for easy tap-to-speak
- Visual audio level indicator during recording
- Keyboard toggle for text input when needed
- Settings quick-access for VAD and voice preferences
Usage:
- Tap the microphone button to start recording
- Speak your question naturally
- Release or tap again to send
- Watch the real-time transcription appear
Streaming Text Display
Responses now stream in real-time with smooth character-by-character rendering:
Features:
- Token-by-token streaming from the AI
- Smooth CSS animations for text appearance
- Markdown rendering with syntax highlighting
- Citation links with hover previews
- RTL support for Arabic text segments
Technical Details:
- Adaptive chunk sizing based on network latency
- Automatic scroll-to-bottom during streaming
- Graceful handling of connection interruptions
Adaptive VAD Presets
Voice Activity Detection now includes three presets for different environments:
| Preset | Sensitivity | Best For |
|---|---|---|
| Sensitive | High | Silent rooms, minimal background noise |
| Balanced | Medium | Typical home/office environments |
| Relaxed | Low | Public spaces, background conversations |
Settings Panel:
Voice Settings
โโโ VAD Preset: [Balanced โผ]
โโโ Auto-stop delay: 1.5s
โโโ Push-to-talk: [ ] Enable
PHI Indicator & Routing
Healthcare-compliant PHI (Protected Health Information) handling:
Visual Indicator:
- Green shield icon: No PHI detected
- Yellow shield: Potential PHI, review recommended
- Red shield: PHI detected, secure handling active
Routing Features:
- Automatic detection of 18 HIPAA identifiers
- Secure channel routing for PHI content
- Audit logging for compliance
- User notification when PHI is detected
RTL Support for Arabic
Full right-to-left support for Arabic content:
Features:
- Automatic language detection
- Bidirectional text rendering
- RTL-aware UI layout
- Arabic numeral support
- Proper text alignment in mixed content
Toggle:
Display Settings
โโโ Text Direction: [Auto โผ] / LTR / RTL
Unified Memory Context
Cross-session context management for personalized interactions:
Capabilities:
- Session history persistence
- User preference memory
- Conversation threading
- Context-aware follow-ups
- Learning style adaptation
Phase 3 Features
Speaker Diarization
Multi-speaker detection and attribution for group conversations:
Capabilities:
- Up to 4 concurrent speakers
- Real-time speaker change detection
- Speaker embedding extraction
- Cross-session speaker re-identification
- Confidence scoring per segment
Use Cases:
- Study circles with multiple participants
- Teacher-student Q&A sessions
- Family Quran recitation sessions
Technical Specs:
- Latency: <200ms for speaker change detection
- Accuracy: >90% speaker attribution
- Models: pyannote.audio segmentation
FHIR R4 Streaming
Healthcare data integration with enterprise-grade resilience:
Supported Resources:
- Patient demographics
- Observations (vitals, lab results)
- Conditions (diagnoses)
- Medications
- Allergies
Resilience Features:
- Exponential backoff retry (1s โ 2s โ 4s โ 8s)
- Circuit breaker pattern
- Partial response handling
- Connection pooling
- Health check monitoring
Configuration:
fhir: base_url: https://fhir.example.com/r4 timeout_ms: 5000 max_retries: 3 circuit_breaker_threshold: 5
Adaptive Quality Controller
Dynamic audio quality adjustment based on network conditions:
Quality Tiers:
| Tier | Bitrate | Sample Rate | Use Case |
|---|---|---|---|
| High | 128kbps | 48kHz | Excellent network |
| Medium | 64kbps | 24kHz | Standard network |
| Low | 32kbps | 16kHz | Poor network |
| Minimal | 16kbps | 8kHz | Very poor network |
Hysteresis Behavior:
- Upgrade: 10 consecutive good measurements
- Downgrade: 3 consecutive poor measurements
- Prevents quality oscillation
Metrics Tracked:
- RTT (Round Trip Time)
- Packet loss percentage
- Jitter
- Available bandwidth
Lexicon Expansion
Quranic Arabic Lexicon (328 terms)
Complete pronunciation coverage for Quranic content:
Surah Names (114): All Surah names with accurate Modern Standard Arabic IPA:
- ุงููุงุชุญุฉ โ /สalfaหtiฤงa/
- ุงูุจูุฑุฉ โ /สalbaqara/
- ุขู ุนู ุฑุงู โ /สaหl สimraหn/
- ... (all 114 Surahs)
Tajweed Terms (50+):
- ุชุฌููุฏ โ /tadสwiหd/
- ุฅุฏุบุงู โ /สidษฃaหm/
- ุฅุฎูุงุก โ /สixfaหส/
- ููููุฉ โ /qalqala/
- ุบูุฉ โ /ษฃunna/
- ู ุฏ โ /madd/
Islamic Vocabulary (200+):
- Common phrases (Bismillah, Alhamdulillah)
- Names of Allah
- Prophet names
- Ritual terminology
- Theological concepts
English Transliteration Lexicon (334 terms)
Transliterated Surah names and Islamic terms for English TTS:
Multiple Spelling Variants:
- Tajweed / tajwid
- Qur'an / Quran
- Insha'Allah / Inshallah
IPA Approximations: Closest English phonemes for Arabic sounds:
- Al-Fatihah โ /รฆl fษหtiหhษห/
- Bismillah โ /bษชsmษชl lษห/
Configuration Reference
Voice Settings
// settings-panel.js configuration voiceSettings: { vadPreset: 'balanced', // 'sensitive' | 'balanced' | 'relaxed' autoStopDelay: 1500, // ms pushToTalk: false, language: 'ar', // 'ar' | 'en' | 'auto' rtlMode: 'auto' // 'auto' | 'ltr' | 'rtl' }
Feature Flags
# Feature flags for v4.1 VOICE_V4_INPUT_BAR = True VOICE_V4_STREAMING_TEXT = True VOICE_V4_SPEAKER_DIARIZATION = True VOICE_V4_FHIR_STREAMING = True VOICE_V4_ADAPTIVE_QUALITY = True VOICE_V4_PHI_ROUTING = True
Migration Guide
From v4.0 to v4.1
Breaking Changes:
- None - v4.1 is fully backward compatible
Recommended Updates:
- Enable new feature flags in configuration
- Update client to use streaming text display
- Configure VAD presets for your environment
- Test lexicon pronunciations for Quranic content
New Dependencies:
pyannote.audio>=3.0.0 # Speaker diarization
fhir.resources>=7.0.0 # FHIR R4 support
Performance Metrics
Latency Targets
| Operation | Target | Measured |
|---|---|---|
| Voice input to transcription | <500ms | 320ms |
| Speaker change detection | <200ms | 180ms |
| Text streaming first token | <300ms | 250ms |
| FHIR resource fetch | <1000ms | 650ms |
| Quality tier switch | <100ms | 80ms |
Resource Usage
| Component | Memory | CPU |
|---|---|---|
| Speaker diarization | +150MB | +5% |
| FHIR client | +20MB | +2% |
| Adaptive quality | +5MB | +1% |
| Lexicon service | +10MB | <1% |
Known Issues
- Speaker diarization accuracy: May decrease with >4 simultaneous speakers
- FHIR timeout: First request after idle may timeout (connection pool warming)
- RTL mixed content: Complex bidirectional text may occasionally misalign
Acknowledgments
Voice Mode v4.1 was developed with contributions from:
- Platform team for core infrastructure
- Healthcare team for FHIR integration
- Localization team for Arabic/RTL support
- Community contributors for lexicon expansion
UI Components Guide
Screenshots: For visual reference, see the annotated screenshots in
/docs/voice/screenshots/:
voice-input-bar-states.png- VoiceFirstInputBar in idle, recording, processing statesstreaming-text-rtl.png- StreamingTextDisplay with RTL Arabic contentquality-badge-tiers.png- QualityBadge showing all 4 quality levelsphi-indicator-states.png- PHI indicator (green/yellow/red states)vad-presets-panel.png- VAD preset selection in settings panel
VoiceFirstInputBar Component
Location: /var/www/quran/js/components/VoiceFirstInputBar.js
The primary interface for voice interaction in v4.1:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ "What would you like to learn about today?" โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โ โญโโโโโโโโโโโโโโโโโโโโโโโโโโฎ โ
โ โ [ ๐ค ] โ โ Tap to Speak โ
โ โ Recording... โ โ
โ โฐโโโโโโโโโโโโโโโโโโโโโโโโโโฏ โ
โ โ
โ [ โจ๏ธ Text ] [ โ๏ธ Settings ] [ โ Help ] โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
States:
| State | Visual | Behavior |
|---|---|---|
| Idle | Grey microphone | Tap to start recording |
| Recording | Pulsing red + waveform | Real-time audio levels |
| Processing | Spinning indicator | Transcription in progress |
| Error | Red outline + message | Retry or switch to keyboard |
Props:
<VoiceFirstInputBar onTranscript={(text) => handleSubmit(text)} vadPreset="balanced" language="ar" placeholder="Ask about any Surah..." showKeyboardToggle={true} />
StreamingTextDisplay Component
Location: /var/www/quran/js/components/StreamingTextDisplay.js
Renders AI responses with real-time streaming:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ ๐ค Assistant 12:34 PM โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ Surah Al-Fatihah (ุงููุงุชุญุฉ) is the opening chapter of โ
โ the Quran. It consists of seven verses and is recited โ
โ in every unit of prayer... โ
โ โ
โ **Key Themes:** โ
โ โข Praise of Allah (verses 1-4) โ
โ โข Request for guidance (verses 5-7) โ
โ โข The straight path (ุงูุตุฑุงุท ุงูู
ุณุชููู
)โ โ
โ โcursor โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ ๐ Source: Tafsir Ibn Kathir, Vol 1, p.23 [View โ] โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Features:
- Token-by-token rendering with cursor animation
- Markdown support (bold, lists, code blocks)
- RTL text detection and rendering
- Citation card expansion on click
- Copy/share buttons on completion
Props:
<StreamingTextDisplay stream={responseStream} onComplete={() => setIsStreaming(false)} showCitations={true} enableRTL="auto" animationSpeed={30} // ms per character />
QualityBadge Component
Location: /var/www/quran/js/components/QualityBadge.js
Displays current audio quality and network status:
Normal view: Expanded on hover/tap:
โโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ ๐ถ High โ โ ๐ถ High Quality โ
โโโโโโโโโโโ โ โโโโโโโโโโโโโโโโโโโโโโโ โ
โ Bitrate: 128 kbps โ
โ RTT: 45ms โ
โ Packet Loss: 0.1% โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโ
Quality Indicators:
| Badge | Color | Meaning |
|---|---|---|
| ๐ถ High | Green | Excellent connection |
| ๐ถ Med | Yellow | Acceptable quality |
| ๐ถ Low | Orange | Degraded quality |
| ๐ถ Min | Red | Minimal quality mode |
Props:
<QualityBadge quality={networkQuality} showDetails={true} onQualityChange={(tier) => logQualityEvent(tier)} />
Component Integration Example
// Main voice interface integration function VoiceInterface() { const [streaming, setStreaming] = useState(false); const [quality, setQuality] = useState("high"); return ( <div className="voice-interface"> <QualityBadge quality={quality} /> <StreamingTextDisplay stream={responseStream} onComplete={() => setStreaming(false)} showCitations={true} /> <VoiceFirstInputBar onTranscript={handleSubmit} disabled={streaming} vadPreset={settings.vadPreset} /> </div> ); }
Related Documentation
- Voice Mode Architecture
- Speaker Diarization Service
- FHIR Streaming Service
- Adaptive Quality Service
- Lexicon Service Guide
Release Date: December 2024 Version: 4.1.0 Status: Production Ready