Real-time Proxy Specification
Overview
The VoiceAssist platform uses WebSocket connections for real-time bidirectional communication between the client and the OpenAI Realtime API. This document specifies the protocol, message formats, error handling, and implementation details.
WebSocket Endpoint
Connection URL
wss://assist.asimo.io/api/realtime
Query Parameters
| Parameter | Type | Required | Description |
|---|
conversationId | string | Yes | Unique conversation identifier |
token | string | Yes | JWT authentication token |
Example Connection
const conversationId = "conv-123";
const token = "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...";
const ws = new WebSocket(`wss://assist.asimo.io/api/realtime?conversationId=${conversationId}&token=${token}`);
Conversation Scoping
Overview
Each WebSocket connection is scoped to a single conversation. The conversationId query parameter determines which conversation the WebSocket session belongs to. This scoping ensures proper message isolation and history management.
Conversation-WebSocket Relationship
One-to-One Mapping:
- Each WebSocket connection is associated with exactly one conversation
- Each conversation can have at most one active WebSocket connection per client
- Messages sent over the WebSocket are automatically associated with the conversation
Connection Lifecycle:
Conversation Created → Load History → Connect WebSocket → Send/Receive Messages
↓
Switch Conversation: Disconnect WebSocket → Connect to New Conversation
↓
Delete Conversation: Disconnect WebSocket → Conversation Removed
Switching Conversations
Process:
- Client disconnects existing WebSocket connection
- Client clears message state for old conversation
- Client fetches new conversation history via REST API:
GET /api/conversations/{newConversationId}/messages
- Client connects new WebSocket with new
conversationId:
const ws = new WebSocket(`wss://assist.asimo.io/api/realtime?conversationId=${newConversationId}&token=${token}`);
Critical Requirements:
- Old WebSocket must be disconnected before connecting to new conversation
- Message state must be cleared to prevent cross-contamination
- Connection to new conversation must use correct
conversationId parameter
Error Prevention:
// WRONG: Switching conversationId without disconnecting
ws.send(JSON.stringify({ conversationId: "new-id" })); // ❌ NOT SUPPORTED
// CORRECT: Disconnect old, connect new
oldWs.close();
const newWs = new WebSocket(`wss://...?conversationId=new-id&token=${token}`); // ✅
Message Persistence
REST API (Persistent):
- Messages are stored in the database associated with their conversation
- History retrieved via:
GET /api/conversations/{conversationId}/messages
- Persists across WebSocket disconnections
WebSocket (Real-time):
- New messages sent via WebSocket are saved to database
- Streaming responses are saved when complete (
message.done event)
- Messages persist even if WebSocket disconnects during streaming
Initial Load:
// 1. Load conversation history from REST API
const history = await apiClient.getMessages(conversationId, 1, 50);
// 2. Initialize messages with history
const [messages, setMessages] = useState(history.items);
// 3. Connect WebSocket for new real-time messages
const ws = useChatSession({ conversationId, initialMessages: history.items });
Authorization
Conversation Access Control:
- Server validates JWT token
- Server extracts user ID from token
- Server checks if user owns conversation with given
conversationId
- If unauthorized, connection is rejected with
AUTH_FAILED error
Security Flow:
Client connects with conversationId + token
↓
Server validates token signature
↓
Server extracts userId from token
↓
Server queries: SELECT * FROM conversations WHERE id = conversationId AND userId = userId
↓
If found: Allow connection
If not found: Reject with AUTH_FAILED
See detailed conversation management: CONVERSATIONS_AND_ROUTING.md
Connection Lifecycle
1. Connection Handshake
Client Server
│ │
├──── WebSocket CONNECT ───────>│
│ (with query params) │
│ │
│<──────── OPEN ────────────────┤
│ (readyState = 1) │
│ │
2. Heartbeat Mechanism
Purpose: Detect dead connections and keep connection alive
Interval: 30 seconds
Protocol:
Client Server
│ │
├────── ping ──────────────────>│ (every 30s)
│ │
│<─────── pong ──────────────────┤
│ │
Ping Message:
{
"type": "ping"
}
Pong Response:
{
"type": "pong"
}
3. Connection Close
Normal Closure:
Client Server
│ │
├──── WebSocket CLOSE ─────────>│
│ (code: 1000) │
│ │
│<──────── CLOSE ───────────────┤
│ │
Abnormal Closure (triggers reconnection):
- Code 1006: Connection dropped
- Server crashes or network failure
- Authentication failure
Message Protocol
Event Types
| Event Type | Direction | Description |
|---|
delta | Server → Client | Incremental text update during streaming |
chunk | Server → Client | Complete text chunk |
message.done | Server → Client | Final message with full content and metadata |
message.send | Client → Server | User sends a new message |
error | Server → Client | Error occurred during processing |
ping | Client → Server | Heartbeat from client |
pong | Server → Client | Heartbeat response |
Message Schemas
1. Client → Server: Send Message
Event Type: message.send
Purpose: User sends a new message to the assistant
Schema:
interface MessageSendEvent {
type: "message.send";
message: {
id: string; // Client-generated unique ID
role: "user";
content: string; // Message text
attachments?: string[]; // Optional attachment IDs
timestamp: number; // Unix timestamp in milliseconds
};
}
Example:
{
"type": "message.send",
"message": {
"id": "msg-1732212345678",
"role": "user",
"content": "What is the treatment for hypertension?",
"attachments": ["attachment-1732212340000-medical-report.pdf"],
"timestamp": 1732212345678
}
}
2. Server → Client: Delta Update
Event Type: delta
Purpose: Incremental text updates during streaming response
Schema:
interface DeltaEvent {
type: "delta";
eventId?: string; // Optional unique event ID
messageId: string; // Assistant message ID
delta: string; // Incremental text to append
metadata?: any; // Optional metadata
}
Example Sequence:
// Delta 1
{
"type": "delta",
"messageId": "msg-assistant-1",
"delta": "Treatment for "
}
// Delta 2
{
"type": "delta",
"messageId": "msg-assistant-1",
"delta": "hypertension typically "
}
// Delta 3
{
"type": "delta",
"messageId": "msg-assistant-1",
"delta": "includes lifestyle modifications and medication."
}
Client Behavior:
- Append
delta to existing message content
- If no message exists with
messageId, create new message
- Update UI in real-time as deltas arrive
- Show streaming indicator while receiving deltas
3. Server → Client: Chunk Update
Event Type: chunk
Purpose: Complete text chunks (alternative to delta)
Schema:
interface ChunkEvent {
type: "chunk";
eventId?: string;
messageId: string;
content: string; // Complete text chunk
metadata?: any;
}
Example:
{
"type": "chunk",
"messageId": "msg-assistant-1",
"content": "Treatment for hypertension includes lifestyle modifications and medication."
}
Client Behavior:
- Append
content to existing message
- Similar to delta but with larger chunks
4. Server → Client: Message Done
Event Type: message.done
Purpose: Signal end of streaming and provide final message
Schema:
interface MessageDoneEvent {
type: "message.done";
message: {
id: string;
role: "assistant";
content: string; // Final complete message text
citations?: Citation[]; // Optional citations/sources
attachments?: string[]; // Optional attachment IDs
timestamp: number; // Unix timestamp
metadata?: any;
};
}
Citation Schema:
interface Citation {
id: string; // Unique citation ID
source: "kb" | "url"; // Knowledge base or external URL
reference: string; // Document ID or URL
snippet?: string; // Relevant excerpt
page?: number; // Page number (for PDFs)
metadata?: Record<string, any>;
}
Example:
{
"type": "message.done",
"message": {
"id": "msg-assistant-1",
"role": "assistant",
"content": "Treatment for hypertension includes lifestyle modifications such as diet and exercise, and medications like ACE inhibitors or diuretics.",
"citations": [
{
"id": "cite-1",
"source": "kb",
"reference": "doc-clinical-guidelines-2024",
"snippet": "Lifestyle modifications are first-line treatment for hypertension.",
"page": 42,
"metadata": {
"author": "American Heart Association",
"year": "2024"
}
}
],
"timestamp": 1732212350000
}
}
Client Behavior:
- Replace streaming message with final message
- Display citations if present
- Hide streaming indicator
- Scroll to show complete message
- Call
onMessage callback if provided
5. Server → Client: Error
Event Type: error
Purpose: Communicate errors during processing
Schema:
interface ErrorEvent {
type: "error";
error: {
code: WebSocketErrorCode;
message: string;
details?: any;
};
}
type WebSocketErrorCode =
| "AUTH_FAILED" // Authentication failed
| "RATE_LIMITED" // Too many requests
| "QUOTA_EXCEEDED" // Usage quota exceeded
| "INVALID_EVENT" // Malformed event
| "BACKEND_ERROR" // Server error
| "CONNECTION_DROPPED"; // Connection lost
Examples:
Rate Limited:
{
"type": "error",
"error": {
"code": "RATE_LIMITED",
"message": "Too many requests. Please slow down.",
"details": {
"retryAfter": 30
}
}
}
Authentication Failed:
{
"type": "error",
"error": {
"code": "AUTH_FAILED",
"message": "Invalid or expired authentication token."
}
}
Backend Error:
{
"type": "error",
"error": {
"code": "BACKEND_ERROR",
"message": "An unexpected error occurred. Please try again."
}
}
Client Behavior:
- Display error toast/notification
- For
AUTH_FAILED, QUOTA_EXCEEDED: Close connection (fatal)
- For
RATE_LIMITED, BACKEND_ERROR: Show transient error
- Auto-dismiss transient errors after 5 seconds
- Call
onError callback if provided
Error Handling
Error Categories
1. Fatal Errors (Close Connection)
| Error Code | Description | Client Action |
|---|
AUTH_FAILED | Invalid or expired token | Close connection, redirect to login |
QUOTA_EXCEEDED | Usage limit reached | Close connection, show quota error |
2. Transient Errors (Show Toast)
| Error Code | Description | Client Action |
|---|
RATE_LIMITED | Too many requests | Show error toast for 5s |
BACKEND_ERROR | Server error | Show error toast for 5s |
INVALID_EVENT | Malformed message | Show error toast for 5s |
3. Connection Errors (Reconnect)
| Error Code | Description | Client Action |
|---|
CONNECTION_DROPPED | Lost connection | Attempt reconnection with backoff |
Reconnection Logic
Strategy: Exponential backoff with maximum attempts
Parameters:
- Initial delay: 1 second
- Backoff multiplier: 2x
- Maximum attempts: 5
- Maximum delay: 16 seconds
Delay Sequence:
- 1 second
- 2 seconds
- 4 seconds
- 8 seconds
- 16 seconds
Implementation:
const BASE_RECONNECT_DELAY = 1000; // 1 second
const MAX_RECONNECT_ATTEMPTS = 5;
let reconnectAttempts = 0;
function attemptReconnect() {
if (reconnectAttempts >= MAX_RECONNECT_ATTEMPTS) {
showError("CONNECTION_DROPPED", "Maximum reconnection attempts reached");
return;
}
const delay = BASE_RECONNECT_DELAY * Math.pow(2, reconnectAttempts);
reconnectAttempts++;
setTimeout(() => {
connect();
}, delay);
}
Connection States
State Machine
┌──────────────┐
│ disconnected │──┐
└──────────────┘ │
▲ │ connect()
│ │
│ ▼
│ ┌────────────┐
│ │ connecting │
│ └────────────┘
│ │
│ │ onopen
│ ▼
│ ┌───────────┐
└───┤ connected │
│ └───────────┘
│ │
onclose│ │ onerror / onclose
│ ▼
│ ┌──────────────┐
└───┤ reconnecting │
└──────────────┘
State Descriptions
| State | Description | UI Indicator |
|---|
connecting | Initial connection in progress | Yellow pulsing dot |
connected | WebSocket open and ready | Green solid dot |
reconnecting | Attempting to reconnect after disconnect | Orange pinging dot |
disconnected | Connection closed, not reconnecting | Red solid dot + Retry button |
Rate Limiting
Client-Side Throttling
Message Sending:
- Maximum: 10 messages per minute
- Burst: 3 messages per 5 seconds
Heartbeat:
- Fixed interval: 30 seconds
- No user-triggered pings
Server-Side Limits
Per User:
- 100 messages per hour
- 1000 messages per day
Per Conversation:
- 50 messages per 10 minutes
Response:
{
"type": "error",
"error": {
"code": "RATE_LIMITED",
"message": "Too many requests. Please slow down.",
"details": {
"limit": "100 messages per hour",
"retryAfter": 3600
}
}
}
Security
Authentication
Token Validation:
- Extract
token from WebSocket query parameter
- Verify JWT signature and expiration
- Extract user ID from token
- Authorize conversation access
Token Refresh:
- Client refreshes token before expiration
- Reconnects with new token automatically
Server-Side:
- Validate all event types
- Sanitize message content
- Check message length limits (max 10,000 characters)
- Validate attachment IDs
Client-Side:
- Sanitize user input before display
- Validate file types and sizes for attachments
- Use
react-markdown for safe markdown rendering
Connection Security
- HTTPS/WSS only
- TLS 1.2 or higher
- Certificate pinning (optional)
Message Batching
Delta Events:
- Server may batch small deltas to reduce event frequency
- Target: 10-20 deltas per second maximum
- Client handles rapid delta updates efficiently
Streaming Latency
Target Metrics:
- Time to first token: <200ms
- Average delta interval: 50-100ms
- Total response time: <2s for typical responses
Message Size Limits
| Type | Maximum Size |
|---|
| User message content | 10,000 characters |
| Delta content | 1,000 characters |
| Chunk content | 5,000 characters |
| Citation snippet | 500 characters |
| Attachments | 10 MB per file |
Testing
WebSocket Test Suite
Connection Tests:
- Successful connection with valid token
- Rejected connection with invalid token
- Reconnection after disconnect
- Heartbeat mechanism
Message Flow Tests:
- Send user message
- Receive delta events
- Receive message.done event
- Handle error events
Error Handling Tests:
- Fatal errors close connection
- Transient errors show toast
- Reconnection with exponential backoff
See test implementation: apps/web-app/src/hooks/__tests__/useChatSession.test.ts
Monitoring
Client-Side Metrics
Connection Quality:
- Connection success rate
- Reconnection frequency
- Average connection duration
- Heartbeat response time
Message Performance:
- Message send latency
- Time to first token
- Average streaming duration
- Delta reception rate
Error Tracking:
- Error frequency by type
- Fatal vs transient errors
- Reconnection success rate
Server-Side Metrics
WebSocket Connections:
- Active connections
- Connection duration
- Disconnection reasons
Message Processing:
- Messages processed per second
- Average response time
- Error rate
Client Implementation Reference
useChatSession Hook
Location: apps/web-app/src/hooks/useChatSession.ts
Key Features:
- Automatic connection management
- Message state synchronization
- Streaming support with delta/chunk handling
- Reconnection with exponential backoff
- Error handling and callbacks
Usage Example:
import { useChatSession } from '../hooks/useChatSession';
function ChatPage() {
const {
messages,
connectionStatus,
isTyping,
sendMessage,
reconnect,
} = useChatSession({
conversationId: 'conv-123',
onError: (code, message) => {
console.error(`WebSocket error: ${code} - ${message}`);
},
onConnectionChange: (status) => {
console.log(`Connection status: ${status}`);
},
});
return (
<div>
<ConnectionStatus status={connectionStatus} onReconnect={reconnect} />
<MessageList messages={messages} isTyping={isTyping} />
<MessageInput onSend={sendMessage} disabled={connectionStatus !== 'connected'} />
</div>
);
}