2:I[7012,["4765","static/chunks/4765-f5afdf8061f456f3.js","9856","static/chunks/9856-3b185291364d9bef.js","6687","static/chunks/app/docs/%5B...slug%5D/page-e07536548216bee4.js"],"MarkdownRenderer"]
4:I[9856,["4765","static/chunks/4765-f5afdf8061f456f3.js","9856","static/chunks/9856-3b185291364d9bef.js","6687","static/chunks/app/docs/%5B...slug%5D/page-e07536548216bee4.js"],""]
5:I[4126,[],""]
7:I[9630,[],""]
8:I[4278,["9856","static/chunks/9856-3b185291364d9bef.js","8172","static/chunks/8172-b3a2d6fe4ae10d40.js","3185","static/chunks/app/layout-2814fa5d15b84fe4.js"],"HeadingProvider"]
9:I[1476,["9856","static/chunks/9856-3b185291364d9bef.js","8172","static/chunks/8172-b3a2d6fe4ae10d40.js","3185","static/chunks/app/layout-2814fa5d15b84fe4.js"],"Header"]
a:I[3167,["9856","static/chunks/9856-3b185291364d9bef.js","8172","static/chunks/8172-b3a2d6fe4ae10d40.js","3185","static/chunks/app/layout-2814fa5d15b84fe4.js"],"Sidebar"]
b:I[7409,["9856","static/chunks/9856-3b185291364d9bef.js","8172","static/chunks/8172-b3a2d6fe4ae10d40.js","3185","static/chunks/app/layout-2814fa5d15b84fe4.js"],"PageFrame"]
3:T32f6,
# Thinker Service

> **Location:** `services/api-gateway/app/services/thinker_service.py`
> **Status:** Production Ready
> **Last Updated:** 2025-12-01

## Overview

The ThinkerService is the reasoning engine of the Thinker-Talker voice pipeline. It manages conversation context, orchestrates LLM interactions, and handles tool calling with result injection.

## Architecture

```
┌─────────────────────────────────────────────────────────────────┐
│                      ThinkerService                              │
│                                                                  │
│  ┌──────────────────┐    ┌──────────────────┐                   │
│  │ ConversationContext │◄──│   ThinkerSession  │                 │
│  │ (shared memory)     │    │   (per-request)   │                │
│  └──────────────────┘    └──────────────────┘                   │
│           │                        │                             │
│           │                        ▼                             │
│           │              ┌──────────────────┐                   │
│           │              │    LLMClient     │                   │
│           │              │   (GPT-4o)       │                   │
│           │              └──────────────────┘                   │
│           │                        │                             │
│           │                        ▼                             │
│           │              ┌──────────────────┐                   │
│           │              │   ToolRegistry   │                   │
│           │              │ (calendar, search,│                  │
│           │              │  medical, KB)     │                   │
│           └──────────────┴──────────────────┘                   │
└─────────────────────────────────────────────────────────────────┘
```

## Classes

### ThinkerService

Main service class (singleton pattern).

```python
from app.services.thinker_service import thinker_service

# Create a session for a conversation
session = thinker_service.create_session(
    conversation_id="conv-123",
    on_token=handle_token,         # Called for each LLM token
    on_tool_call=handle_tool_call, # Called when tool is invoked
    on_tool_result=handle_result,  # Called when tool returns
    user_id="user-456",            # Required for authenticated tools
)

# Process user input
response = await session.think("What's on my calendar today?")
```

#### Methods

| Method             | Description               | Parameters                                                                                  | Returns          |
| ------------------ | ------------------------- | ------------------------------------------------------------------------------------------- | ---------------- |
| `create_session()` | Create a thinking session | `conversation_id`, `on_token`, `on_tool_call`, `on_tool_result`, `system_prompt`, `user_id` | `ThinkerSession` |
| `register_tool()`  | Register a new tool       | `name`, `description`, `parameters`, `handler`                                              | `None`           |

### ThinkerSession

Session class for processing individual requests.

```python
class ThinkerSession:
    """
    A single thinking session with streaming support.

    Manages the flow:
    1. Receive user input
    2. Add to conversation context
    3. Call LLM with streaming
    4. Handle tool calls if needed
    5. Stream response tokens to callback
    """
```

#### Methods

| Method          | Description              | Parameters                            | Returns               |
| --------------- | ------------------------ | ------------------------------------- | --------------------- |
| `think()`       | Process user input       | `user_input: str`, `source_mode: str` | `ThinkerResponse`     |
| `cancel()`      | Cancel processing        | None                                  | `None`                |
| `get_context()` | Get conversation context | None                                  | `ConversationContext` |
| `get_metrics()` | Get session metrics      | None                                  | `ThinkerMetrics`      |

#### Properties

| Property | Type            | Description              |
| -------- | --------------- | ------------------------ |
| `state`  | `ThinkingState` | Current processing state |

### ConversationContext

Manages conversation history with smart trimming.

```python
class ConversationContext:
    MAX_HISTORY_MESSAGES = 20    # Maximum messages to retain
    MAX_CONTEXT_TOKENS = 8000    # Token budget for context

    def __init__(self, conversation_id: str, system_prompt: str = None):
        self.conversation_id = conversation_id
        self.messages: List[ConversationMessage] = []
        self.system_prompt = system_prompt or self._default_system_prompt()
```

#### Smart Trimming

When message count exceeds `MAX_HISTORY_MESSAGES`, the context performs smart trimming:

```python
def _smart_trim(self) -> None:
    """
    Trim messages while preserving tool call chains.

    OpenAI requires: assistant (with tool_calls) -> tool (with tool_call_id)
    We can't break this chain or the API will reject the request.
    """
```

**Rules:**

- Never trim an assistant message if the next message is a tool result
- Never trim a tool message (it needs its preceding assistant message)
- Find the first safe trim point that doesn't break chains

#### Methods

| Method                   | Description                    |
| ------------------------ | ------------------------------ |
| `add_message()`          | Add a message to history       |
| `get_messages_for_llm()` | Format messages for OpenAI API |
| `clear()`                | Clear all history              |

### ToolRegistry

Registry for available tools.

```python
class ToolRegistry:
    def register(
        self,
        name: str,
        description: str,
        parameters: Dict,
        handler: Callable[[Dict], Awaitable[Any]],
    ) -> None:
        """Register a tool with its schema and handler."""

    def get_tools_schema(self) -> List[Dict]:
        """Get all tool schemas for LLM API."""

    async def execute(self, tool_name: str, arguments: Dict, user_id: str) -> Any:
        """Execute a tool and return its result."""
```

## Data Classes

### ThinkingState

```python
class ThinkingState(str, Enum):
    IDLE = "idle"           # Waiting for input
    PROCESSING = "processing"  # Building request
    TOOL_CALLING = "tool_calling"  # Executing tool
    GENERATING = "generating"  # Streaming response
    COMPLETE = "complete"    # Finished successfully
    CANCELLED = "cancelled"  # User interrupted
    ERROR = "error"          # Error occurred
```

### ConversationMessage

```python
@dataclass
class ConversationMessage:
    role: str              # "user", "assistant", "system", "tool"
    content: str
    message_id: str        # Auto-generated UUID
    timestamp: float       # Unix timestamp
    source_mode: str       # "chat" or "voice"
    tool_call_id: str      # For tool results
    tool_calls: List[Dict] # For assistant messages with tool calls
    citations: List[Dict]  # Source citations
```

### ThinkerResponse

```python
@dataclass
class ThinkerResponse:
    text: str                    # Complete response text
    message_id: str              # Unique ID
    citations: List[Dict]        # Source citations
    tool_calls_made: List[str]   # Names of tools called
    latency_ms: int              # Total processing time
    tokens_used: int             # Token count
    state: ThinkingState         # Final state
```

### ThinkerMetrics

```python
@dataclass
class ThinkerMetrics:
    total_tokens: int = 0
    tool_calls_count: int = 0
    first_token_latency_ms: int = 0
    total_latency_ms: int = 0
    cancelled: bool = False
```

## Available Tools

The ThinkerService automatically registers tools from the unified ToolService:

| Tool                    | Description               | Requires Auth |
| ----------------------- | ------------------------- | ------------- |
| `calendar_create_event` | Create calendar events    | Yes           |
| `calendar_list_events`  | List upcoming events      | Yes           |
| `calendar_update_event` | Modify existing events    | Yes           |
| `calendar_delete_event` | Remove events             | Yes           |
| `web_search`            | Search the web            | No            |
| `pubmed_search`         | Search medical literature | No            |
| `medical_calculator`    | Calculate medical scores  | No            |
| `kb_search`             | Search knowledge base     | No            |

## System Prompt

The default system prompt includes:

1. **Current Time Context**: Dynamic date/time with relative calculations
2. **Conversation Memory**: Instructions to use conversation history
3. **Tool Usage Guidelines**: When and how to use each tool
4. **Response Style**: Concise, natural, voice-optimized

```python
def _default_system_prompt(self) -> str:
    tz = pytz.timezone("America/New_York")
    now = datetime.now(tz)

    return f"""You are VoiceAssist, a helpful AI voice assistant.

CURRENT TIME CONTEXT:
- Current date: {now.strftime("%A, %B %d, %Y")}
- Current time: {now.strftime("%I:%M %p %Z")}

CONVERSATION MEMORY:
You have access to the full conversation history...

AVAILABLE TOOLS:
- calendar_create_event: Create events...
- web_search: Search the web...
...

KEY BEHAVIORS:
- Keep responses concise and natural for voice
- Use short sentences (max 15-20 words)
- Avoid abbreviations - say "blood pressure" not "BP"
"""
```

## Usage Examples

### Basic Query Processing

```python
from app.services.thinker_service import thinker_service

async def handle_voice_query(conversation_id: str, transcript: str, user_id: str):
    # Token streaming callback
    async def on_token(token: str):
        await send_to_tts(token)

    # Create session with callbacks
    session = thinker_service.create_session(
        conversation_id=conversation_id,
        on_token=on_token,
        user_id=user_id,
    )

    # Process the transcript
    response = await session.think(transcript, source_mode="voice")

    print(f"Response: {response.text}")
    print(f"Tools used: {response.tool_calls_made}")
    print(f"Latency: {response.latency_ms}ms")
```

### With Tool Call Handling

```python
async def handle_tool_call(event: ToolCallEvent):
    """Called when LLM decides to call a tool."""
    await send_to_client({
        "type": "tool.call",
        "tool_name": event.tool_name,
        "arguments": event.arguments,
    })

async def handle_tool_result(event: ToolResultEvent):
    """Called when tool execution completes."""
    await send_to_client({
        "type": "tool.result",
        "tool_name": event.tool_name,
        "result": event.result,
    })

session = thinker_service.create_session(
    conversation_id="conv-123",
    on_token=on_token,
    on_tool_call=handle_tool_call,
    on_tool_result=handle_tool_result,
    user_id="user-456",
)
```

### Cancellation (Barge-in)

```python
# Store session reference
active_session = thinker_service.create_session(...)

# When user barges in:
async def handle_barge_in():
    await active_session.cancel()
    print(f"Cancelled: {active_session.is_cancelled()}")
```

## Context Persistence

Conversation contexts are persisted across turns:

```python
# Class-level storage
_conversation_contexts: Dict[str, ConversationContext] = {}
_context_last_access: Dict[str, float] = {}
CONTEXT_TTL_SECONDS = 3600  # 1 hour TTL
```

- Contexts are automatically cleaned up after 1 hour of inactivity
- Same conversation_id reuses existing context
- Context persists across voice and chat modes

## Error Handling

```python
try:
    response = await session.think(transcript)
except Exception as e:
    # Errors are caught and returned in response
    response = ThinkerResponse(
        text=f"I apologize, but I encountered an error: {str(e)}",
        message_id=message_id,
        state=ThinkingState.ERROR,
    )
```

## Related Documentation

- [Thinker-Talker Pipeline Overview](../THINKER_TALKER_PIPELINE.md)
- [Talker Service](talker-service.md)
- [Tool Service](../../services/api-gateway/app/services/tools/tool_service.py)
6:["slug","services/thinker-service","c"]
0:["X7oMT3VrOffzp0qvbeOas",[[["",{"children":["docs",{"children":[["slug","services/thinker-service","c"],{"children":["__PAGE__?{\"slug\":[\"services\",\"thinker-service\"]}",{}]}]}]},"$undefined","$undefined",true],["",{"children":["docs",{"children":[["slug","services/thinker-service","c"],{"children":["__PAGE__",{},[["$L1",["$","div",null,{"children":[["$","div",null,{"className":"mb-6 flex items-center justify-between gap-4","children":[["$","div",null,{"children":[["$","p",null,{"className":"text-sm text-gray-500 dark:text-gray-400","children":"Docs / Raw"}],["$","h1",null,{"className":"text-3xl font-bold text-gray-900 dark:text-white","children":"Thinker Service"}],["$","p",null,{"className":"text-sm text-gray-600 dark:text-gray-400","children":["Sourced from"," ",["$","code",null,{"className":"font-mono text-xs","children":["docs/","services/thinker-service.md"]}]]}]]}],["$","a",null,{"href":"https://github.com/mohammednazmy/VoiceAssist/edit/main/docs/services/thinker-service.md","target":"_blank","rel":"noreferrer","className":"inline-flex items-center gap-2 rounded-md border border-gray-200 dark:border-gray-700 px-3 py-1.5 text-sm text-gray-700 dark:text-gray-200 hover:border-primary-500 dark:hover:border-primary-400 hover:text-primary-700 dark:hover:text-primary-300","children":"Edit on GitHub"}]]}],["$","div",null,{"className":"rounded-lg border border-gray-200 dark:border-gray-800 bg-white dark:bg-gray-900 p-6","children":["$","$L2",null,{"content":"$3"}]}],["$","div",null,{"className":"mt-6 flex flex-wrap gap-2 text-sm","children":[["$","$L4",null,{"href":"/reference/all-docs","className":"inline-flex items-center gap-1 rounded-md bg-gray-100 px-3 py-1 text-gray-700 hover:bg-gray-200 dark:bg-gray-800 dark:text-gray-200 dark:hover:bg-gray-700","children":"← All documentation"}],["$","$L4",null,{"href":"/","className":"inline-flex items-center gap-1 rounded-md bg-gray-100 px-3 py-1 text-gray-700 hover:bg-gray-200 dark:bg-gray-800 dark:text-gray-200 dark:hover:bg-gray-700","children":"Home"}]]}]]}],null],null],null]},[null,["$","$L5",null,{"parallelRouterKey":"children","segmentPath":["children","docs","children","$6","children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L7",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":"$undefined","notFoundStyles":"$undefined"}]],null]},[null,["$","$L5",null,{"parallelRouterKey":"children","segmentPath":["children","docs","children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L7",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":"$undefined","notFoundStyles":"$undefined"}]],null]},[[[["$","link","0",{"rel":"stylesheet","href":"/_next/static/css/7f586cdbbaa33ff7.css","precedence":"next","crossOrigin":"$undefined"}]],["$","html",null,{"lang":"en","className":"h-full","children":["$","body",null,{"className":"__className_f367f3 h-full bg-white dark:bg-gray-900","children":[["$","a",null,{"href":"#main-content","className":"skip-to-content","children":"Skip to main content"}],["$","$L8",null,{"children":[["$","$L9",null,{}],["$","$La",null,{}],["$","main",null,{"id":"main-content","className":"lg:pl-64","role":"main","aria-label":"Documentation content","children":["$","$Lb",null,{"children":["$","$L5",null,{"parallelRouterKey":"children","segmentPath":["children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L7",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":[["$","title",null,{"children":"404: This page could not be found."}],["$","div",null,{"style":{"fontFamily":"system-ui,\"Segoe UI\",Roboto,Helvetica,Arial,sans-serif,\"Apple Color Emoji\",\"Segoe UI Emoji\"","height":"100vh","textAlign":"center","display":"flex","flexDirection":"column","alignItems":"center","justifyContent":"center"},"children":["$","div",null,{"children":[["$","style",null,{"dangerouslySetInnerHTML":{"__html":"body{color:#000;background:#fff;margin:0}.next-error-h1{border-right:1px solid rgba(0,0,0,.3)}@media (prefers-color-scheme:dark){body{color:#fff;background:#000}.next-error-h1{border-right:1px solid rgba(255,255,255,.3)}}"}}],["$","h1",null,{"className":"next-error-h1","style":{"display":"inline-block","margin":"0 20px 0 0","padding":"0 23px 0 0","fontSize":24,"fontWeight":500,"verticalAlign":"top","lineHeight":"49px"},"children":"404"}],["$","div",null,{"style":{"display":"inline-block"},"children":["$","h2",null,{"style":{"fontSize":14,"fontWeight":400,"lineHeight":"49px","margin":0},"children":"This page could not be found."}]}]]}]}]],"notFoundStyles":[]}]}]}]]}]]}]}]],null],null],["$Lc",null]]]]
c:[["$","meta","0",{"name":"viewport","content":"width=device-width, initial-scale=1"}],["$","meta","1",{"charSet":"utf-8"}],["$","title","2",{"children":"Thinker Service | Docs | VoiceAssist Docs"}],["$","meta","3",{"name":"description","content":"Reasoning engine managing conversation context, LLM orchestration, and tool calling in the voice pipeline."}],["$","meta","4",{"name":"keywords","content":"VoiceAssist,documentation,medical AI,voice assistant,healthcare,HIPAA,API"}],["$","meta","5",{"name":"robots","content":"index, follow"}],["$","meta","6",{"name":"googlebot","content":"index, follow"}],["$","link","7",{"rel":"canonical","href":"https://assistdocs.asimo.io"}],["$","meta","8",{"property":"og:title","content":"VoiceAssist Documentation"}],["$","meta","9",{"property":"og:description","content":"Comprehensive documentation for VoiceAssist - Enterprise Medical AI Assistant"}],["$","meta","10",{"property":"og:url","content":"https://assistdocs.asimo.io"}],["$","meta","11",{"property":"og:site_name","content":"VoiceAssist Docs"}],["$","meta","12",{"property":"og:type","content":"website"}],["$","meta","13",{"name":"twitter:card","content":"summary"}],["$","meta","14",{"name":"twitter:title","content":"VoiceAssist Documentation"}],["$","meta","15",{"name":"twitter:description","content":"Comprehensive documentation for VoiceAssist - Enterprise Medical AI Assistant"}],["$","meta","16",{"name":"next-size-adjust"}]]
1:null