VoiceAssist Voice State - November 28, 2025

Summary

Voice mode has been significantly improved with barge-in support, audio overlap prevention, and graceful error handling. The system now properly handles user interruptions during AI responses.

Voice Mode Overhaul (2025-11-29): Added per-user voice preferences persistence, context-aware voice style detection, advanced TTS controls, and aggressive latency optimizations (200ms VAD, 256-sample chunks, 300ms reconnect).

Changes Since Last Update (2025-11-25)

New Features

Feature	Status	Description
Barge-in support	Live	User can interrupt AI while speaking
Audio overlap prevention	Live	Prevents multiple responses playing simultaneously
Benign error handling	Live	Gracefully handles cancellation failures
Audio playback tracking	Live	Tracks current audio element for cleanup

Voice Mode Overhaul (2025-11-29)

Feature	Status	Description
User voice preferences (backend)	Live	Per-user TTS settings stored in database
Context-aware voice styles	Live	Auto-detects CALM/URGENT/EMPATHETIC/INSTRUCTIONAL tones
Advanced TTS controls	Live	Stability, clarity, expressiveness sliders in UI
Aggressive VAD tuning	Live	200ms silence, 150ms prefix, 256-sample chunks
Faster reconnection	Live	300ms base delay (was 1000ms)
Backend preferences sync	Live	Cross-device settings via `/api/voice/preferences`

Technical Implementation

1. Barge-in Flow

When the user starts speaking while the AI is responding:

input_audio_buffer.speech_started
         ↓
Check activeResponseIdRef
         ↓
Send response.cancel to OpenAI
         ↓
Call onSpeechStarted() callback
         ↓
VoiceModePanel.stopCurrentAudio()
         ↓
Audio stops, queue cleared, response ID incremented

2. Audio Playback Management

New refs added to VoiceModePanel.tsx:

// Track currently playing Audio element for stopping on barge-in
const currentAudioRef = useRef<HTMLAudioElement | null>(null);

// Prevent overlapping response processing
const isProcessingResponseRef = useRef(false);

// Response ID to invalidate stale responses
const currentResponseIdRef = useRef<number>(0);

3. Response Tracking

New refs added to useRealtimeVoiceSession.ts:

// Track active response ID for cancellation
const activeResponseIdRef = useRef<string | null>(null);

Handled message types:

response.created - Track new response ID
response.done - Clear response ID
response.cancelled - Clear response ID

4. Benign Error Handling

Errors like "Cancellation failed: no active response found" are now handled gracefully:

case "error": {
  const errorMessage = message.error?.message || "Realtime API error";

  if (
    errorMessage.includes("Cancellation failed") ||
    errorMessage.includes("no active response") ||
    errorCode === "cancellation_failed"
  ) {
    voiceLog.debug(`Ignoring benign error: ${errorMessage}`);
    break;
  }

  handleError(new Error(errorMessage));
  break;
}

Files Modified

File	Changes
`apps/web-app/src/hooks/useRealtimeVoiceSession.ts`	Added `activeResponseIdRef`, `onSpeechStarted` callback, response tracking, barge-in logic, benign error handling
`apps/web-app/src/components/voice/VoiceModePanel.tsx`	Added audio tracking refs, `stopCurrentAudio()`, overlap prevention in `onRelayResult`

Current Test Status

Test Suite	Tests	Status
Backend: test_openai_config.py	17 passed, 3 skipped (live)	✅
Backend: test_voice_metrics.py	11 passed	✅
Frontend: useRealtimeVoiceSession	22 passed	✅
Frontend: voiceSettingsStore	17 passed	✅
Frontend: VoiceModeSettings	25 passed	✅
Frontend: useChatSession-voice-integration	8 passed	✅

Known Issues

First audio chunk silent: First audio chunk may show -Infinity dB - this is expected before mic produces audio
WebSocket errors on page navigation: Expected when switching conversations - handled gracefully

Architecture Overview

Frontend (dev.asimo.io)
├── VoiceModePanel (UI component)
│   ├── stopCurrentAudio() - stops playback on barge-in
│   ├── currentAudioRef - tracks playing audio
│   ├── isProcessingResponseRef - prevents overlaps
│   └── currentResponseIdRef - invalidates stale responses
│
├── useRealtimeVoiceSession (hook)
│   ├── activeResponseIdRef - tracks OpenAI response
│   ├── onSpeechStarted callback - notifies panel
│   ├── response.cancel - sends to OpenAI
│   └── Benign error handling
│
└── voiceSettingsStore (Zustand)
    └── Persists: voice, language, vadSensitivity

Quick Commands

# Run backend voice tests
cd /home/asimo/VoiceAssist/services/api-gateway
source venv/bin/activate && export PYTHONPATH=.
python -m pytest tests/integration/test_openai_config.py tests/integration/test_voice_metrics.py -v

# Run frontend voice tests
cd /home/asimo/VoiceAssist/apps/web-app
export NODE_OPTIONS="--max-old-space-size=768"
npx vitest run src/hooks/__tests__/useRealtimeVoiceSession.test.ts \
  src/stores/__tests__/voiceSettingsStore.test.ts \
  src/components/voice/__tests__/VoiceModeSettings.test.tsx

# Build web app
cd /home/asimo/VoiceAssist/apps/web-app
pnpm build

TODOs for Future Work

Voice UX Features

Audio level visualization during recording
Per-user voice preferences persistence (backend) ✅ Implemented 2025-11-29
Voice activity visualization improvements
Multi-language auto-detection
Session resumption on reconnect

Testing

E2E tests for barge-in functionality
Test voice→chat transcript content in chat timeline
Performance baseline (connection <2s, STT <500ms)

Infrastructure

Configure Prometheus scrapes for voice metrics
Set up Grafana dashboards for voice SLOs
Configure Sentry alerts for voice SLO violations

VOICE_MODE_PIPELINE.md - Full pipeline architecture
VOICE_MODE_SETTINGS_GUIDE.md - User settings
VOICE_READY_STATE_2025-11-25.md - Previous state

Last updated: 2025-11-28 by Claude

Voice State 2025-11-29