Docs / Raw

Real-time Architecture

Sourced from docs/REALTIME_ARCHITECTURE.md

Edit on GitHub

VoiceAssist Real-time Architecture

Last Updated: 2025-11-27 Status: Production Ready

Related Documentation:


Overview

VoiceAssist uses WebSocket connections for real-time bidirectional communication, enabling:

  • Streaming chat responses - Token-by-token LLM output
  • Voice interactions - Speech-to-text and text-to-speech
  • Live updates - Typing indicators, connection status

Architecture Diagram

┌─────────────────────────────────────────────────────────────────────────┐
│                              Client                                      │
│  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────────────┐ │
│  │   Chat UI       │  │   Voice Input   │  │   Connection Manager    │ │
│  │                 │  │   (Web Audio)   │  │   - Reconnection        │ │
│  │   - Messages    │  │   - Mic capture │  │   - Heartbeat           │ │
│  │   - Streaming   │  │   - STT stream  │  │   - Token refresh       │ │
│  └────────┬────────┘  └────────┬────────┘  └────────────┬────────────┘ │
│           │                    │                        │               │
│           └────────────────────┼────────────────────────┘               │
│                                │                                        │
│                         ┌──────▼──────┐                                │
│                         │  WebSocket  │                                │
│                         │   Client    │                                │
│                         └──────┬──────┘                                │
└────────────────────────────────┼────────────────────────────────────────┘
                                 │
                          WSS/WS │
                                 │
┌────────────────────────────────┼────────────────────────────────────────┐
│                                │                                        │
│                         ┌──────▼──────┐                                │
│                         │  WebSocket  │                                │
│                         │   Handler   │                                │
│                         │  (FastAPI)  │                                │
│                         └──────┬──────┘                                │
│                                │                                        │
│           ┌────────────────────┼────────────────────┐                  │
│           │                    │                    │                   │
│    ┌──────▼──────┐      ┌──────▼──────┐     ┌──────▼──────┐           │
│    │   Chat      │      │   Voice     │     │ Connection  │           │
│    │   Service   │      │   Service   │     │   Manager   │           │
│    │             │      │             │     │             │           │
│    │ - RAG Query │      │ - STT       │     │ - Sessions  │           │
│    │ - LLM Call  │      │ - TTS       │     │ - Heartbeat │           │
│    │ - Streaming │      │ - VAD       │     │ - Auth      │           │
│    └──────┬──────┘      └──────┬──────┘     └─────────────┘           │
│           │                    │                                        │
│           └────────────────────┼────────────────────────────────────────┤
│                                │                                        │
│                         ┌──────▼──────┐                                │
│                         │   OpenAI    │                                │
│                         │   API       │                                │
│                         │             │                                │
│                         │ - GPT-4     │                                │
│                         │ - Whisper   │                                │
│                         │ - TTS       │                                │
│                         └─────────────┘                                │
│                                                                         │
│                              Backend                                    │
└─────────────────────────────────────────────────────────────────────────┘

Connection Lifecycle

1. Connection Establishment

Client                                    Server
  │                                         │
  ├──── WebSocket Connect ─────────────────►│
  │     (with token & conversationId)       │
  │                                         │
  │◄──── connection_established ────────────┤
  │      { connectionId, serverTime }       │
  │                                         │

2. Message Exchange

Client                                    Server
  │                                         │
  ├──── message ───────────────────────────►│
  │     { content: "Hello" }                │
  │                                         │
  │◄──── thinking ──────────────────────────┤
  │                                         │
  │◄──── assistant_chunk ───────────────────┤
  │      { content: "Hi" }                  │
  │◄──── assistant_chunk ───────────────────┤
  │      { content: " there" }              │
  │◄──── assistant_chunk ───────────────────┤
  │      { content: "!" }                   │
  │                                         │
  │◄──── message_complete ──────────────────┤
  │      { messageId, totalTokens }         │
  │                                         │

3. Heartbeat

Client                                    Server
  │                                         │
  ├──── ping ──────────────────────────────►│
  │                                         │
  │◄──── pong ──────────────────────────────┤
  │                                         │

WebSocket Endpoints

EndpointPurpose
/api/realtime/wsMain chat WebSocket
/api/voice/wsVoice-specific WebSocket (future)

Query Parameters

ParameterRequiredDescription
conversationIdYesUUID of the conversation session
tokenYesJWT access token

Connection URL Example

// Development ws://localhost:8000/api/realtime/ws?conversationId=uuid&token=jwt // Production wss://assist.asimo.io/api/realtime/ws?conversationId=uuid&token=jwt

Message Types

Client → Server

TypeDescription
messageSend user message
pingHeartbeat ping
stopCancel current response
voice_startBegin voice input (future)
voice_chunkAudio data chunk (future)
voice_endEnd voice input (future)

Server → Client

TypeDescription
connection_establishedConnection successful
thinkingAI is processing
assistant_chunkStreaming response chunk
message_completeResponse finished
errorError occurred
pongHeartbeat response
voice_transcriptSpeech-to-text result (future)
voice_audioTTS audio chunk (future)

Streaming Response Flow

RAG + LLM Pipeline

User Message → WebSocket Handler
                    │
                    ▼
            ┌───────────────┐
            │  RAG Service  │ ← Retrieves relevant context
            │               │   from Qdrant vector store
            └───────┬───────┘
                    │
                    ▼
            ┌───────────────┐
            │  LLM Client   │ ← Calls OpenAI with streaming
            │               │
            └───────┬───────┘
                    │
          ┌─────────┼─────────┐
          │         │         │
          ▼         ▼         ▼
       chunk_1   chunk_2   chunk_n
          │         │         │
          └─────────┼─────────┘
                    │
                    ▼
            WebSocket Send
            (per chunk)

Streaming Implementation

# Backend (FastAPI WebSocket handler) async def handle_message(websocket, message): # Send thinking indicator await websocket.send_json({"type": "thinking"}) # Get RAG context context = await rag_service.retrieve(message.content) # Stream LLM response async for chunk in llm_client.stream_chat(message.content, context): await websocket.send_json({ "type": "assistant_chunk", "content": chunk.content }) # Send completion await websocket.send_json({ "type": "message_complete", "messageId": str(uuid.uuid4()), "totalTokens": chunk.usage.total_tokens })

Voice Architecture (Future Enhancement)

Voice Input Flow

Microphone → Web Audio API → VAD (Voice Activity Detection)
                                      │
                                      ▼
                              Audio Chunks (PCM)
                                      │
                                      ▼
                              WebSocket Send
                                      │
                                      ▼
                              Server VAD + STT
                                      │
                                      ▼
                              Transcript Event

Voice Output Flow

LLM Response Text → TTS Service (OpenAI/ElevenLabs)
                           │
                           ▼
                    Audio Stream (MP3/PCM)
                           │
                           ▼
                    WebSocket Send (chunks)
                           │
                           ▼
                    Web Audio API Playback

Error Handling

Reconnection Strategy

class WebSocketClient { private reconnectAttempts = 0; private maxReconnectAttempts = 5; private baseDelay = 1000; // 1 second async reconnect() { const delay = Math.min( this.baseDelay * Math.pow(2, this.reconnectAttempts), 30000, // max 30 seconds ); await sleep(delay); this.reconnectAttempts++; if (this.reconnectAttempts < this.maxReconnectAttempts) { await this.connect(); } else { this.emit("connection_failed"); } } }

Error Types

Error CodeDescriptionClient Action
auth_failedInvalid/expired tokenRefresh token and reconnect
session_not_foundInvalid conversation IDCreate new session
rate_limitedToo many requestsBackoff and retry
server_errorInternal server errorRetry with backoff

Performance Considerations

Client-side

  1. Buffer chunks - Don't update DOM on every chunk
  2. Throttle renders - Use requestAnimationFrame
  3. Heartbeat interval - 30 seconds recommended

Server-side

  1. Connection pooling - Reuse OpenAI connections
  2. Chunk size - Optimize for network vs. latency
  3. Memory management - Clean up closed connections

Security

  1. Authentication - JWT token required in query params
  2. Rate limiting - Per-user connection limits
  3. Message validation - Schema validation on all messages
  4. TLS - WSS required in production


Version History

VersionDateChanges
1.0.02025-11-27Initial architecture document
Beginning of guide
End of guide