Docs / Raw

What's New in Voice Mode v4.1

Sourced from docs/voice/whats-new-v4-1.md

Edit on GitHub

What's New in Voice Mode v4.1

Voice Mode v4.1 is a major release that introduces a voice-first interface, advanced speech processing, healthcare integrations, and comprehensive Arabic/Quranic language support.

Release Highlights

  • Voice-First Input Bar: Redesigned interface optimized for speech interaction
  • Streaming Text Display: Real-time response rendering with smooth animations
  • Speaker Diarization: Multi-speaker detection and attribution
  • FHIR R4 Streaming: Healthcare data integration with retry resilience
  • Adaptive Audio Quality: Dynamic bitrate adjustment based on network conditions
  • Quranic Lexicon: 662 terms including all 114 Surah names and Tajweed terminology

Getting Started

Using the Voice-First Input Bar

  1. Tap the microphone button to start recording
  2. Speak naturally - the system detects when you stop speaking
  3. Wait for transcription - your speech appears as text
  4. View the response - streamed in real-time with Arabic support

Selecting a VAD Preset

Choose the preset that matches your environment:

EnvironmentRecommended Preset
Quiet roomSensitive
Home/OfficeBalanced (default)
Public/NoisyRelaxed

To change: Settings (โš™๏ธ) โ†’ Voice Settings โ†’ VAD Preset

Understanding the Indicators

Quality Badge (๐Ÿ“ถ):

  • ๐ŸŸข High: Excellent connection (128kbps)
  • ๐ŸŸก Medium: Acceptable quality (64kbps)
  • ๐ŸŸ  Low: Degraded quality (32kbps)
  • ๐Ÿ”ด Minimal: Very poor connection (16kbps)

PHI Indicator (๐Ÿ›ก๏ธ):

  • ๐ŸŸข Green: No sensitive health info detected
  • ๐ŸŸก Yellow: Potential PHI - review recommended
  • ๐Ÿ”ด Red: PHI detected - secure handling active

Adjusting Thinking Feedback

Control what happens while the AI processes your request:

  • Tone: Soft chime when processing starts/ends
  • Haptic: Gentle vibration feedback (mobile)
  • Visual: Pulsing indicator animation

To configure: Settings (โš™๏ธ) โ†’ Thinking Feedback


New Features

Voice-First Input Bar

The input interface has been completely redesigned to prioritize voice interaction:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  [๐ŸŽค Tap to Speak]  โ”‚  [โŒจ๏ธ]  โ”‚  [โš™๏ธ]           โ”‚
โ”‚                                                 โ”‚
โ”‚  "Ask me anything about the Quran..."           โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Key Features:

  • Large, prominent microphone button for easy tap-to-speak
  • Visual audio level indicator during recording
  • Keyboard toggle for text input when needed
  • Settings quick-access for VAD and voice preferences

Usage:

  1. Tap the microphone button to start recording
  2. Speak your question naturally
  3. Release or tap again to send
  4. Watch the real-time transcription appear

Streaming Text Display

Responses now stream in real-time with smooth character-by-character rendering:

Features:

  • Token-by-token streaming from the AI
  • Smooth CSS animations for text appearance
  • Markdown rendering with syntax highlighting
  • Citation links with hover previews
  • RTL support for Arabic text segments

Technical Details:

  • Adaptive chunk sizing based on network latency
  • Automatic scroll-to-bottom during streaming
  • Graceful handling of connection interruptions

Adaptive VAD Presets

Voice Activity Detection now includes three presets for different environments:

PresetSensitivityBest For
SensitiveHighSilent rooms, minimal background noise
BalancedMediumTypical home/office environments
RelaxedLowPublic spaces, background conversations

Settings Panel:

Voice Settings
โ”œโ”€โ”€ VAD Preset: [Balanced โ–ผ]
โ”œโ”€โ”€ Auto-stop delay: 1.5s
โ””โ”€โ”€ Push-to-talk: [ ] Enable

PHI Indicator & Routing

Healthcare-compliant PHI (Protected Health Information) handling:

Visual Indicator:

  • Green shield icon: No PHI detected
  • Yellow shield: Potential PHI, review recommended
  • Red shield: PHI detected, secure handling active

Routing Features:

  • Automatic detection of 18 HIPAA identifiers
  • Secure channel routing for PHI content
  • Audit logging for compliance
  • User notification when PHI is detected

RTL Support for Arabic

Full right-to-left support for Arabic content:

Features:

  • Automatic language detection
  • Bidirectional text rendering
  • RTL-aware UI layout
  • Arabic numeral support
  • Proper text alignment in mixed content

Toggle:

Display Settings
โ””โ”€โ”€ Text Direction: [Auto โ–ผ] / LTR / RTL

Unified Memory Context

Cross-session context management for personalized interactions:

Capabilities:

  • Session history persistence
  • User preference memory
  • Conversation threading
  • Context-aware follow-ups
  • Learning style adaptation

Phase 3 Features

Speaker Diarization

Multi-speaker detection and attribution for group conversations:

Capabilities:

  • Up to 4 concurrent speakers
  • Real-time speaker change detection
  • Speaker embedding extraction
  • Cross-session speaker re-identification
  • Confidence scoring per segment

Use Cases:

  • Study circles with multiple participants
  • Teacher-student Q&A sessions
  • Family Quran recitation sessions

Technical Specs:

  • Latency: <200ms for speaker change detection
  • Accuracy: >90% speaker attribution
  • Models: pyannote.audio segmentation

FHIR R4 Streaming

Healthcare data integration with enterprise-grade resilience:

Supported Resources:

  • Patient demographics
  • Observations (vitals, lab results)
  • Conditions (diagnoses)
  • Medications
  • Allergies

Resilience Features:

  • Exponential backoff retry (1s โ†’ 2s โ†’ 4s โ†’ 8s)
  • Circuit breaker pattern
  • Partial response handling
  • Connection pooling
  • Health check monitoring

Configuration:

fhir: base_url: https://fhir.example.com/r4 timeout_ms: 5000 max_retries: 3 circuit_breaker_threshold: 5

Adaptive Quality Controller

Dynamic audio quality adjustment based on network conditions:

Quality Tiers:

TierBitrateSample RateUse Case
High128kbps48kHzExcellent network
Medium64kbps24kHzStandard network
Low32kbps16kHzPoor network
Minimal16kbps8kHzVery poor network

Hysteresis Behavior:

  • Upgrade: 10 consecutive good measurements
  • Downgrade: 3 consecutive poor measurements
  • Prevents quality oscillation

Metrics Tracked:

  • RTT (Round Trip Time)
  • Packet loss percentage
  • Jitter
  • Available bandwidth

Lexicon Expansion

Quranic Arabic Lexicon (328 terms)

Complete pronunciation coverage for Quranic content:

Surah Names (114): All Surah names with accurate Modern Standard Arabic IPA:

  • ุงู„ูุงุชุญุฉ โ†’ /ส”alfaหtiฤงa/
  • ุงู„ุจู‚ุฑุฉ โ†’ /ส”albaqara/
  • ุขู„ ุนู…ุฑุงู† โ†’ /ส”aหl ส•imraหn/
  • ... (all 114 Surahs)

Tajweed Terms (50+):

  • ุชุฌูˆูŠุฏ โ†’ /tadส’wiหd/
  • ุฅุฏุบุงู… โ†’ /ส”idษฃaหm/
  • ุฅุฎูุงุก โ†’ /ส”ixfaหส”/
  • ู‚ู„ู‚ู„ุฉ โ†’ /qalqala/
  • ุบู†ุฉ โ†’ /ษฃunna/
  • ู…ุฏ โ†’ /madd/

Islamic Vocabulary (200+):

  • Common phrases (Bismillah, Alhamdulillah)
  • Names of Allah
  • Prophet names
  • Ritual terminology
  • Theological concepts

English Transliteration Lexicon (334 terms)

Transliterated Surah names and Islamic terms for English TTS:

Multiple Spelling Variants:

  • Tajweed / tajwid
  • Qur'an / Quran
  • Insha'Allah / Inshallah

IPA Approximations: Closest English phonemes for Arabic sounds:

  • Al-Fatihah โ†’ /รฆl fษ‘หtiหhษ‘ห/
  • Bismillah โ†’ /bษชsmษชl lษ‘ห/

Configuration Reference

Voice Settings

// settings-panel.js configuration voiceSettings: { vadPreset: 'balanced', // 'sensitive' | 'balanced' | 'relaxed' autoStopDelay: 1500, // ms pushToTalk: false, language: 'ar', // 'ar' | 'en' | 'auto' rtlMode: 'auto' // 'auto' | 'ltr' | 'rtl' }

Feature Flags

# Feature flags for v4.1 VOICE_V4_INPUT_BAR = True VOICE_V4_STREAMING_TEXT = True VOICE_V4_SPEAKER_DIARIZATION = True VOICE_V4_FHIR_STREAMING = True VOICE_V4_ADAPTIVE_QUALITY = True VOICE_V4_PHI_ROUTING = True

Migration Guide

From v4.0 to v4.1

Breaking Changes:

  • None - v4.1 is fully backward compatible

Recommended Updates:

  1. Enable new feature flags in configuration
  2. Update client to use streaming text display
  3. Configure VAD presets for your environment
  4. Test lexicon pronunciations for Quranic content

New Dependencies:

pyannote.audio>=3.0.0  # Speaker diarization
fhir.resources>=7.0.0  # FHIR R4 support

Performance Metrics

Latency Targets

OperationTargetMeasured
Voice input to transcription<500ms320ms
Speaker change detection<200ms180ms
Text streaming first token<300ms250ms
FHIR resource fetch<1000ms650ms
Quality tier switch<100ms80ms

Resource Usage

ComponentMemoryCPU
Speaker diarization+150MB+5%
FHIR client+20MB+2%
Adaptive quality+5MB+1%
Lexicon service+10MB<1%

Known Issues

  1. Speaker diarization accuracy: May decrease with >4 simultaneous speakers
  2. FHIR timeout: First request after idle may timeout (connection pool warming)
  3. RTL mixed content: Complex bidirectional text may occasionally misalign

Acknowledgments

Voice Mode v4.1 was developed with contributions from:

  • Platform team for core infrastructure
  • Healthcare team for FHIR integration
  • Localization team for Arabic/RTL support
  • Community contributors for lexicon expansion

UI Components Guide

Screenshots: For visual reference, see the annotated screenshots in /docs/voice/screenshots/:

  • voice-input-bar-states.png - VoiceFirstInputBar in idle, recording, processing states
  • streaming-text-rtl.png - StreamingTextDisplay with RTL Arabic content
  • quality-badge-tiers.png - QualityBadge showing all 4 quality levels
  • phi-indicator-states.png - PHI indicator (green/yellow/red states)
  • vad-presets-panel.png - VAD preset selection in settings panel

VoiceFirstInputBar Component

Location: /var/www/quran/js/components/VoiceFirstInputBar.js

The primary interface for voice interaction in v4.1:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                                                             โ”‚
โ”‚    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”‚
โ”‚    โ”‚  "What would you like to learn about today?"      โ”‚    โ”‚
โ”‚    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ”‚
โ”‚                                                             โ”‚
โ”‚              โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ                   โ”‚
โ”‚              โ”‚     [  ๐ŸŽค  ]            โ”‚  โ† Tap to Speak   โ”‚
โ”‚              โ”‚   Recording...          โ”‚                   โ”‚
โ”‚              โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ                   โ”‚
โ”‚                                                             โ”‚
โ”‚         [ โŒจ๏ธ Text ]    [ โš™๏ธ Settings ]    [ โ“ Help ]      โ”‚
โ”‚                                                             โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

States:

StateVisualBehavior
IdleGrey microphoneTap to start recording
RecordingPulsing red + waveformReal-time audio levels
ProcessingSpinning indicatorTranscription in progress
ErrorRed outline + messageRetry or switch to keyboard

Props:

<VoiceFirstInputBar onTranscript={(text) => handleSubmit(text)} vadPreset="balanced" language="ar" placeholder="Ask about any Surah..." showKeyboardToggle={true} />

StreamingTextDisplay Component

Location: /var/www/quran/js/components/StreamingTextDisplay.js

Renders AI responses with real-time streaming:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  ๐Ÿค– Assistant                                    12:34 PM   โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                             โ”‚
โ”‚  Surah Al-Fatihah (ุงู„ูุงุชุญุฉ) is the opening chapter of     โ”‚
โ”‚  the Quran. It consists of seven verses and is recited     โ”‚
โ”‚  in every unit of prayer...                                โ”‚
โ”‚                                                             โ”‚
โ”‚  **Key Themes:**                                           โ”‚
โ”‚  โ€ข Praise of Allah (verses 1-4)                            โ”‚
โ”‚  โ€ข Request for guidance (verses 5-7)                       โ”‚
โ”‚  โ€ข The straight path (ุงู„ุตุฑุงุท ุงู„ู…ุณุชู‚ูŠู…)โ–ˆ                   โ”‚
โ”‚                                                    โ†‘cursor โ”‚
โ”‚  โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”‚
โ”‚  ๐Ÿ“– Source: Tafsir Ibn Kathir, Vol 1, p.23  [View โ†’]       โ”‚
โ”‚                                                             โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Features:

  • Token-by-token rendering with cursor animation
  • Markdown support (bold, lists, code blocks)
  • RTL text detection and rendering
  • Citation card expansion on click
  • Copy/share buttons on completion

Props:

<StreamingTextDisplay stream={responseStream} onComplete={() => setIsStreaming(false)} showCitations={true} enableRTL="auto" animationSpeed={30} // ms per character />

QualityBadge Component

Location: /var/www/quran/js/components/QualityBadge.js

Displays current audio quality and network status:

Normal view:          Expanded on hover/tap:
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”          โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ ๐Ÿ“ถ High โ”‚          โ”‚ ๐Ÿ“ถ High Quality         โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜          โ”‚ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ โ”‚
                     โ”‚ Bitrate: 128 kbps       โ”‚
                     โ”‚ RTT: 45ms               โ”‚
                     โ”‚ Packet Loss: 0.1%       โ”‚
                     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Quality Indicators:

BadgeColorMeaning
๐Ÿ“ถ HighGreenExcellent connection
๐Ÿ“ถ MedYellowAcceptable quality
๐Ÿ“ถ LowOrangeDegraded quality
๐Ÿ“ถ MinRedMinimal quality mode

Props:

<QualityBadge quality={networkQuality} showDetails={true} onQualityChange={(tier) => logQualityEvent(tier)} />

Component Integration Example

// Main voice interface integration function VoiceInterface() { const [streaming, setStreaming] = useState(false); const [quality, setQuality] = useState("high"); return ( <div className="voice-interface"> <QualityBadge quality={quality} /> <StreamingTextDisplay stream={responseStream} onComplete={() => setStreaming(false)} showCitations={true} /> <VoiceFirstInputBar onTranscript={handleSubmit} disabled={streaming} vadPreset={settings.vadPreset} /> </div> ); }


Release Date: December 2024 Version: 4.1.0 Status: Production Ready

Beginning of guide
End of guide