2:I[7012,["4765","static/chunks/4765-f5afdf8061f456f3.js","9856","static/chunks/9856-3b185291364d9bef.js","6687","static/chunks/app/docs/%5B...slug%5D/page-e07536548216bee4.js"],"MarkdownRenderer"]
4:I[9856,["4765","static/chunks/4765-f5afdf8061f456f3.js","9856","static/chunks/9856-3b185291364d9bef.js","6687","static/chunks/app/docs/%5B...slug%5D/page-e07536548216bee4.js"],""]
5:I[4126,[],""]
7:I[9630,[],""]
8:I[4278,["9856","static/chunks/9856-3b185291364d9bef.js","8172","static/chunks/8172-b3a2d6fe4ae10d40.js","3185","static/chunks/app/layout-2814fa5d15b84fe4.js"],"HeadingProvider"]
9:I[1476,["9856","static/chunks/9856-3b185291364d9bef.js","8172","static/chunks/8172-b3a2d6fe4ae10d40.js","3185","static/chunks/app/layout-2814fa5d15b84fe4.js"],"Header"]
a:I[3167,["9856","static/chunks/9856-3b185291364d9bef.js","8172","static/chunks/8172-b3a2d6fe4ae10d40.js","3185","static/chunks/app/layout-2814fa5d15b84fe4.js"],"Sidebar"]
b:I[7409,["9856","static/chunks/9856-3b185291364d9bef.js","8172","static/chunks/8172-b3a2d6fe4ae10d40.js","3185","static/chunks/app/layout-2814fa5d15b84fe4.js"],"PageFrame"]
3:T32dc,
# WebSocket Protocol Specification

**Version:** 1.0
**Last Updated:** 2025-11-27
**Status:** Production

**Related Documentation:**

- [Realtime Architecture](REALTIME_ARCHITECTURE.md) - System architecture
- [Voice Mode Pipeline](VOICE_MODE_PIPELINE.md) - Voice implementation
- [Thinker-Talker Pipeline](THINKER_TALKER_PIPELINE.md) - T/T voice architecture
- [Voice Pipeline WebSocket API](api-reference/voice-pipeline-ws.md) - T/T WebSocket protocol
- [Implementation Status](overview/IMPLEMENTATION_STATUS.md) - Component status

---

## WebSocket Endpoints

VoiceAssist provides multiple WebSocket endpoints:

| Endpoint                      | Purpose                        | Protocol Doc                                               |
| ----------------------------- | ------------------------------ | ---------------------------------------------------------- |
| `/api/realtime/ws`            | Chat streaming                 | This document                                              |
| `/api/voice/pipeline-ws`      | Thinker-Talker voice (Primary) | [voice-pipeline-ws.md](api-reference/voice-pipeline-ws.md) |
| `/api/voice/realtime-session` | OpenAI Realtime (Legacy)       | [VOICE_MODE_PIPELINE.md](VOICE_MODE_PIPELINE.md)           |

---

## Chat Streaming Protocol

## Overview

The VoiceAssist WebSocket protocol enables real-time bidirectional communication between the web client and backend server for chat streaming. This document specifies the message format, event types, and connection lifecycle.

### Connection Endpoint

**Development:**

```
ws://localhost:8000/api/realtime/ws
```

**Production:**

```
wss://assist.asimo.io/api/realtime/ws
```

### Query Parameters

| Parameter        | Type   | Required | Description                         |
| ---------------- | ------ | -------- | ----------------------------------- |
| `conversationId` | string | Yes      | UUID of the conversation            |
| `token`          | string | Yes      | JWT access token for authentication |

### Example Connection

```typescript
const url = new URL("ws://localhost:8000/api/realtime/ws");
url.searchParams.append("conversationId", "conversation-uuid");
url.searchParams.append("token", "jwt-access-token");

const ws = new WebSocket(url.toString());
```

---

## Message Protocol

All messages are JSON-encoded. Field names use **camelCase** convention.

### Client → Server Messages

#### 1. Send Message

Send a chat message to the assistant.

```json
{
  "type": "message",
  "content": "What are the treatment protocols for hypertension?",
  "session_id": "conversation-uuid",
  "attachments": []
}
```

**Fields:**

- `type`: Always `"message"`
- `content`: User's message text (string, required)
- `session_id`: Conversation/session identifier (string, optional)
- `attachments`: Array of attachment IDs (string[], optional)

#### 2. Ping (Heartbeat)

Keep the connection alive.

```json
{
  "type": "ping"
}
```

**Response:** Server will respond with `pong` message

---

### Server → Client Messages

#### 1. Connection Established

Sent immediately after connection is established.

```json
{
  "type": "connected",
  "client_id": "uuid-v4",
  "timestamp": "2025-11-22T10:00:00.000Z",
  "protocol_version": "1.0",
  "capabilities": ["text_streaming"]
}
```

**Fields:**

- `type`: Always `"connected"`
- `client_id`: Unique identifier for this client connection
- `timestamp`: ISO 8601 timestamp
- `protocol_version`: Protocol version string
- `capabilities`: Array of supported features

#### 2. Message Chunk (Streaming)

Sent while assistant response is being generated.

```json
{
  "type": "chunk",
  "messageId": "msg-uuid",
  "content": "Treatment protocols for hypertension include..."
}
```

**Fields:**

- `type`: Always `"chunk"`
- `messageId`: UUID of the message being streamed
- `content`: Partial response text

**Notes:**

- Multiple chunks are sent for a single message
- Chunks should be appended in order
- No `chunkIndex` field - order is guaranteed by WebSocket

#### 3. Message Complete

Sent when assistant response is fully generated.

```json
{
  "type": "message.done",
  "messageId": "msg-uuid",
  "message": {
    "id": "msg-uuid",
    "role": "assistant",
    "content": "Complete response text...",
    "citations": [
      {
        "id": "cite-1",
        "source": "kb",
        "reference": "Hypertension Guidelines 2024",
        "snippet": "First-line therapy includes..."
      }
    ],
    "timestamp": 1700000000000
  },
  "timestamp": "2025-11-22T10:00:05.000Z"
}
```

**Fields:**

- `type`: Always `"message.done"`
- `messageId`: UUID of the completed message
- `message`: Complete message object
  - `id`: Message UUID
  - `role`: Always `"assistant"` for bot responses
  - `content`: Complete response text
  - `citations`: Array of citation objects (optional)
  - `timestamp`: Unix timestamp in milliseconds
- `timestamp`: ISO 8601 timestamp of completion

**Citation Object:**

```typescript
{
  id: string;           // Unique identifier
  source: 'kb' | 'url'; // Source type
  reference: string;    // Document title or URL
  snippet?: string;     // Relevant excerpt
  page?: number;        // Page number (for documents)
}
```

#### 4. Error

Sent when an error occurs.

```json
{
  "type": "error",
  "messageId": "msg-uuid",
  "timestamp": "2025-11-22T10:00:00.000Z",
  "error": {
    "code": "BACKEND_ERROR",
    "message": "Failed to process query: Network timeout"
  }
}
```

**Fields:**

- `type`: Always `"error"`
- `messageId`: UUID of the message that caused the error (optional)
- `timestamp`: ISO 8601 timestamp
- `error`: Error object
  - `code`: Error code (see Error Codes below)
  - `message`: Human-readable error description

**Error Codes:**

| Code                 | Description           | Action                               |
| -------------------- | --------------------- | ------------------------------------ |
| `AUTH_FAILED`        | Authentication failed | Disconnect, redirect to login        |
| `RATE_LIMITED`       | Too many requests     | Show notification, wait before retry |
| `QUOTA_EXCEEDED`     | Usage quota exceeded  | Disconnect, show upgrade prompt      |
| `INVALID_EVENT`      | Malformed message     | Log error, continue                  |
| `BACKEND_ERROR`      | Server error          | Show notification, allow retry       |
| `CONNECTION_DROPPED` | Connection lost       | Auto-reconnect                       |

#### 5. Pong (Heartbeat Response)

Response to client ping.

```json
{
  "type": "pong",
  "timestamp": "2025-11-22T10:00:00.000Z"
}
```

**Fields:**

- `type`: Always `"pong"`
- `timestamp`: ISO 8601 timestamp

---

## Connection Lifecycle

### 1. Connection Establishment

```mermaid
sequenceDiagram
    participant Client
    participant Server

    Client->>Server: WebSocket handshake
    Server->>Client: Connection accepted
    Server->>Client: { type: "connected", ... }
    Client->>Server: { type: "ping" }
    Server->>Client: { type: "pong", ... }
```

### 2. Message Exchange

```mermaid
sequenceDiagram
    participant Client
    participant Server

    Client->>Server: { type: "message", content: "..." }
    Server->>Client: { type: "chunk", content: "..." }
    Server->>Client: { type: "chunk", content: "..." }
    Server->>Client: { type: "chunk", content: "..." }
    Server->>Client: { type: "message.done", message: {...} }
```

### 3. Error Handling

```mermaid
sequenceDiagram
    participant Client
    participant Server

    Client->>Server: { type: "message", ... }
    Server->>Client: { type: "error", error: {...} }
    Client->>Client: Display error to user
    Client->>Server: { type: "message", ... } (retry if transient)
```

### 4. Heartbeat

```mermaid
sequenceDiagram
    participant Client
    participant Server

    loop Every 30 seconds
        Client->>Server: { type: "ping" }
        Server->>Client: { type: "pong", ... }
    end
```

### 5. Disconnection

```mermaid
sequenceDiagram
    participant Client
    participant Server

    Note over Client: User closes tab
    Client->>Server: WebSocket close
    Server->>Server: Cleanup connection
```

---

## Client Implementation Guide

### Connection Management

**Best Practices:**

1. **Automatic Reconnection**
   - Reconnect on unexpected disconnections
   - Use exponential backoff (1s, 2s, 4s, 8s, 16s)
   - Max 5 reconnection attempts
   - Reset counter on successful connection

2. **Heartbeat**
   - Send ping every 30 seconds
   - Timeout if no pong received in 5 seconds
   - Trigger reconnection on timeout

3. **Connection States**
   ```typescript
   type ConnectionStatus = "connecting" | "connected" | "reconnecting" | "disconnected";
   ```

### Message Handling

**Streaming Response Pattern:**

```typescript
let streamingMessage = null;

function handleMessage(data: WebSocketEvent) {
  switch (data.type) {
    case "chunk":
      if (!streamingMessage) {
        streamingMessage = {
          id: data.messageId,
          role: "assistant",
          content: data.content,
          timestamp: Date.now(),
        };
      } else {
        streamingMessage.content += data.content;
      }
      updateUI(streamingMessage);
      break;

    case "message.done":
      streamingMessage = null;
      addMessage(data.message);
      break;

    case "error":
      handleError(data.error);
      streamingMessage = null;
      break;
  }
}
```

### Error Handling

**Fatal Errors** (disconnect immediately):

- `AUTH_FAILED`
- `QUOTA_EXCEEDED`

**Transient Errors** (show notification, allow retry):

- `RATE_LIMITED`
- `BACKEND_ERROR`

**Recoverable Errors** (auto-reconnect):

- `CONNECTION_DROPPED`

### Example Implementation

See reference implementation: `/apps/web-app/src/hooks/useChatSession.ts`

---

## Server Implementation Guide

### Message Validation

1. Validate all incoming messages against schema
2. Check authentication token on connection
3. Verify conversation/session access
4. Sanitize user input before processing

### Response Streaming

1. Send chunks as they are generated
2. Keep chunks reasonably sized (50-100 characters)
3. Send `message.done` with complete message
4. Include citations in final message only

### Error Reporting

1. Always include error code
2. Provide actionable error messages
3. Log full error details server-side
4. Don't expose sensitive information

### Example Implementation

See reference implementation: `/services/api-gateway/app/api/realtime.py`

---

## Security Considerations

### Authentication

- JWT token required in query parameter
- Validate token on connection
- Reject unauthorized connections immediately
- Token should be short-lived (15 minutes)

### Input Validation

- Validate all incoming JSON
- Sanitize user content
- Limit message length (10,000 characters)
- Rate limit messages (10/minute per user)

### Data Protection

- Use TLS/SSL in production (`wss://`)
- Don't log sensitive user content
- Implement PHI redaction in logs
- Follow HIPAA compliance guidelines

---

## Testing

### Manual Testing

**Connection Test:**

```javascript
const ws = new WebSocket("ws://localhost:8000/api/realtime/ws?conversationId=test&token=test-token");

ws.onopen = () => console.log("Connected");
ws.onmessage = (event) => console.log("Received:", JSON.parse(event.data));
ws.send(
  JSON.stringify({
    type: "message",
    content: "Test message",
    session_id: "test-session",
  }),
);
```

**Ping Test:**

```javascript
ws.send(JSON.stringify({ type: "ping" }));
// Expect: { type: 'pong', timestamp: '...' }
```

### Automated Testing

See test file: `/apps/web-app/src/hooks/__tests__/useChatSession.test.ts`

---

## Changelog

### Version 1.0 (2025-11-22)

**Breaking Changes:**

- Changed `message_chunk` → `chunk`
- Changed `message_complete` → `message.done`
- Changed `message_id` → `messageId` (camelCase)
- Changed citation field names to match frontend types

**Additions:**

- Added `connected` event on connection
- Added `session_id` support in client messages
- Added heartbeat protocol (`ping`/`pong`)

**Fixes:**

- Fixed field name inconsistencies
- Fixed missing `messageId` in error events
- Fixed timestamp format (added milliseconds to message object)

---

## Future Enhancements

### Phase 2: Voice Streaming

**New Event Types:**

- `voice.start` - Start voice streaming
- `voice.chunk` - Audio chunk (base64-encoded)
- `voice.end` - End voice streaming

### Phase 3: Voice Activity Detection

**New Event Types:**

- `vad.speech_start` - User started speaking
- `vad.speech_end` - User stopped speaking

### Phase 4: Turn-Taking

**New Event Types:**

- `interrupt` - User interrupted assistant
- `resume` - Resume previous response

---

## Support

**Questions or Issues:**

- GitHub Issues: https://github.com/mohammednazmy/VoiceAssist/issues
- Documentation: `/docs/`
- API Reference: `/docs/API_REFERENCE.md`

---

**Document Version:** 1.0
**Last Updated:** 2025-11-22
**Maintainer:** VoiceAssist Development Team
6:["slug","WEBSOCKET_PROTOCOL","c"]
0:["X7oMT3VrOffzp0qvbeOas",[[["",{"children":["docs",{"children":[["slug","WEBSOCKET_PROTOCOL","c"],{"children":["__PAGE__?{\"slug\":[\"WEBSOCKET_PROTOCOL\"]}",{}]}]}]},"$undefined","$undefined",true],["",{"children":["docs",{"children":[["slug","WEBSOCKET_PROTOCOL","c"],{"children":["__PAGE__",{},[["$L1",["$","div",null,{"children":[["$","div",null,{"className":"mb-6 flex items-center justify-between gap-4","children":[["$","div",null,{"children":[["$","p",null,{"className":"text-sm text-gray-500 dark:text-gray-400","children":"Docs / Raw"}],["$","h1",null,{"className":"text-3xl font-bold text-gray-900 dark:text-white","children":"WebSocket Protocol Specification"}],["$","p",null,{"className":"text-sm text-gray-600 dark:text-gray-400","children":["Sourced from"," ",["$","code",null,{"className":"font-mono text-xs","children":["docs/","WEBSOCKET_PROTOCOL.md"]}]]}]]}],["$","a",null,{"href":"https://github.com/mohammednazmy/VoiceAssist/edit/main/docs/WEBSOCKET_PROTOCOL.md","target":"_blank","rel":"noreferrer","className":"inline-flex items-center gap-2 rounded-md border border-gray-200 dark:border-gray-700 px-3 py-1.5 text-sm text-gray-700 dark:text-gray-200 hover:border-primary-500 dark:hover:border-primary-400 hover:text-primary-700 dark:hover:text-primary-300","children":"Edit on GitHub"}]]}],["$","div",null,{"className":"rounded-lg border border-gray-200 dark:border-gray-800 bg-white dark:bg-gray-900 p-6","children":["$","$L2",null,{"content":"$3"}]}],["$","div",null,{"className":"mt-6 flex flex-wrap gap-2 text-sm","children":[["$","$L4",null,{"href":"/reference/all-docs","className":"inline-flex items-center gap-1 rounded-md bg-gray-100 px-3 py-1 text-gray-700 hover:bg-gray-200 dark:bg-gray-800 dark:text-gray-200 dark:hover:bg-gray-700","children":"← All documentation"}],["$","$L4",null,{"href":"/","className":"inline-flex items-center gap-1 rounded-md bg-gray-100 px-3 py-1 text-gray-700 hover:bg-gray-200 dark:bg-gray-800 dark:text-gray-200 dark:hover:bg-gray-700","children":"Home"}]]}]]}],null],null],null]},[null,["$","$L5",null,{"parallelRouterKey":"children","segmentPath":["children","docs","children","$6","children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L7",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":"$undefined","notFoundStyles":"$undefined"}]],null]},[null,["$","$L5",null,{"parallelRouterKey":"children","segmentPath":["children","docs","children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L7",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":"$undefined","notFoundStyles":"$undefined"}]],null]},[[[["$","link","0",{"rel":"stylesheet","href":"/_next/static/css/7f586cdbbaa33ff7.css","precedence":"next","crossOrigin":"$undefined"}]],["$","html",null,{"lang":"en","className":"h-full","children":["$","body",null,{"className":"__className_f367f3 h-full bg-white dark:bg-gray-900","children":[["$","a",null,{"href":"#main-content","className":"skip-to-content","children":"Skip to main content"}],["$","$L8",null,{"children":[["$","$L9",null,{}],["$","$La",null,{}],["$","main",null,{"id":"main-content","className":"lg:pl-64","role":"main","aria-label":"Documentation content","children":["$","$Lb",null,{"children":["$","$L5",null,{"parallelRouterKey":"children","segmentPath":["children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L7",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":[["$","title",null,{"children":"404: This page could not be found."}],["$","div",null,{"style":{"fontFamily":"system-ui,\"Segoe UI\",Roboto,Helvetica,Arial,sans-serif,\"Apple Color Emoji\",\"Segoe UI Emoji\"","height":"100vh","textAlign":"center","display":"flex","flexDirection":"column","alignItems":"center","justifyContent":"center"},"children":["$","div",null,{"children":[["$","style",null,{"dangerouslySetInnerHTML":{"__html":"body{color:#000;background:#fff;margin:0}.next-error-h1{border-right:1px solid rgba(0,0,0,.3)}@media (prefers-color-scheme:dark){body{color:#fff;background:#000}.next-error-h1{border-right:1px solid rgba(255,255,255,.3)}}"}}],["$","h1",null,{"className":"next-error-h1","style":{"display":"inline-block","margin":"0 20px 0 0","padding":"0 23px 0 0","fontSize":24,"fontWeight":500,"verticalAlign":"top","lineHeight":"49px"},"children":"404"}],["$","div",null,{"style":{"display":"inline-block"},"children":["$","h2",null,{"style":{"fontSize":14,"fontWeight":400,"lineHeight":"49px","margin":0},"children":"This page could not be found."}]}]]}]}]],"notFoundStyles":[]}]}]}]]}]]}]}]],null],null],["$Lc",null]]]]
c:[["$","meta","0",{"name":"viewport","content":"width=device-width, initial-scale=1"}],["$","meta","1",{"charSet":"utf-8"}],["$","title","2",{"children":"WebSocket Protocol Specification | Docs | VoiceAssist Docs"}],["$","meta","3",{"name":"description","content":"Real-time bidirectional communication protocol for chat streaming and voice."}],["$","meta","4",{"name":"keywords","content":"VoiceAssist,documentation,medical AI,voice assistant,healthcare,HIPAA,API"}],["$","meta","5",{"name":"robots","content":"index, follow"}],["$","meta","6",{"name":"googlebot","content":"index, follow"}],["$","link","7",{"rel":"canonical","href":"https://assistdocs.asimo.io"}],["$","meta","8",{"property":"og:title","content":"VoiceAssist Documentation"}],["$","meta","9",{"property":"og:description","content":"Comprehensive documentation for VoiceAssist - Enterprise Medical AI Assistant"}],["$","meta","10",{"property":"og:url","content":"https://assistdocs.asimo.io"}],["$","meta","11",{"property":"og:site_name","content":"VoiceAssist Docs"}],["$","meta","12",{"property":"og:type","content":"website"}],["$","meta","13",{"name":"twitter:card","content":"summary"}],["$","meta","14",{"name":"twitter:title","content":"VoiceAssist Documentation"}],["$","meta","15",{"name":"twitter:description","content":"Comprehensive documentation for VoiceAssist - Enterprise Medical AI Assistant"}],["$","meta","16",{"name":"next-size-adjust"}]]
1:null