2:I[7012,["4765","static/chunks/4765-f5afdf8061f456f3.js","9856","static/chunks/9856-3b185291364d9bef.js","6687","static/chunks/app/docs/%5B...slug%5D/page-e07536548216bee4.js"],"MarkdownRenderer"] 4:I[9856,["4765","static/chunks/4765-f5afdf8061f456f3.js","9856","static/chunks/9856-3b185291364d9bef.js","6687","static/chunks/app/docs/%5B...slug%5D/page-e07536548216bee4.js"],""] 5:I[4126,[],""] 7:I[9630,[],""] 8:I[4278,["9856","static/chunks/9856-3b185291364d9bef.js","8172","static/chunks/8172-b3a2d6fe4ae10d40.js","3185","static/chunks/app/layout-2814fa5d15b84fe4.js"],"HeadingProvider"] 9:I[1476,["9856","static/chunks/9856-3b185291364d9bef.js","8172","static/chunks/8172-b3a2d6fe4ae10d40.js","3185","static/chunks/app/layout-2814fa5d15b84fe4.js"],"Header"] a:I[3167,["9856","static/chunks/9856-3b185291364d9bef.js","8172","static/chunks/8172-b3a2d6fe4ae10d40.js","3185","static/chunks/app/layout-2814fa5d15b84fe4.js"],"Sidebar"] b:I[7409,["9856","static/chunks/9856-3b185291364d9bef.js","8172","static/chunks/8172-b3a2d6fe4ae10d40.js","3185","static/chunks/app/layout-2814fa5d15b84fe4.js"],"PageFrame"] 3:T32f6, # Thinker Service > **Location:** `services/api-gateway/app/services/thinker_service.py` > **Status:** Production Ready > **Last Updated:** 2025-12-01 ## Overview The ThinkerService is the reasoning engine of the Thinker-Talker voice pipeline. It manages conversation context, orchestrates LLM interactions, and handles tool calling with result injection. ## Architecture ``` ┌─────────────────────────────────────────────────────────────────┐ │ ThinkerService │ │ │ │ ┌──────────────────┐ ┌──────────────────┐ │ │ │ ConversationContext │◄──│ ThinkerSession │ │ │ │ (shared memory) │ │ (per-request) │ │ │ └──────────────────┘ └──────────────────┘ │ │ │ │ │ │ │ ▼ │ │ │ ┌──────────────────┐ │ │ │ │ LLMClient │ │ │ │ │ (GPT-4o) │ │ │ │ └──────────────────┘ │ │ │ │ │ │ │ ▼ │ │ │ ┌──────────────────┐ │ │ │ │ ToolRegistry │ │ │ │ │ (calendar, search,│ │ │ │ │ medical, KB) │ │ │ └──────────────┴──────────────────┘ │ └─────────────────────────────────────────────────────────────────┘ ``` ## Classes ### ThinkerService Main service class (singleton pattern). ```python from app.services.thinker_service import thinker_service # Create a session for a conversation session = thinker_service.create_session( conversation_id="conv-123", on_token=handle_token, # Called for each LLM token on_tool_call=handle_tool_call, # Called when tool is invoked on_tool_result=handle_result, # Called when tool returns user_id="user-456", # Required for authenticated tools ) # Process user input response = await session.think("What's on my calendar today?") ``` #### Methods | Method | Description | Parameters | Returns | | ------------------ | ------------------------- | ------------------------------------------------------------------------------------------- | ---------------- | | `create_session()` | Create a thinking session | `conversation_id`, `on_token`, `on_tool_call`, `on_tool_result`, `system_prompt`, `user_id` | `ThinkerSession` | | `register_tool()` | Register a new tool | `name`, `description`, `parameters`, `handler` | `None` | ### ThinkerSession Session class for processing individual requests. ```python class ThinkerSession: """ A single thinking session with streaming support. Manages the flow: 1. Receive user input 2. Add to conversation context 3. Call LLM with streaming 4. Handle tool calls if needed 5. Stream response tokens to callback """ ``` #### Methods | Method | Description | Parameters | Returns | | --------------- | ------------------------ | ------------------------------------- | --------------------- | | `think()` | Process user input | `user_input: str`, `source_mode: str` | `ThinkerResponse` | | `cancel()` | Cancel processing | None | `None` | | `get_context()` | Get conversation context | None | `ConversationContext` | | `get_metrics()` | Get session metrics | None | `ThinkerMetrics` | #### Properties | Property | Type | Description | | -------- | --------------- | ------------------------ | | `state` | `ThinkingState` | Current processing state | ### ConversationContext Manages conversation history with smart trimming. ```python class ConversationContext: MAX_HISTORY_MESSAGES = 20 # Maximum messages to retain MAX_CONTEXT_TOKENS = 8000 # Token budget for context def __init__(self, conversation_id: str, system_prompt: str = None): self.conversation_id = conversation_id self.messages: List[ConversationMessage] = [] self.system_prompt = system_prompt or self._default_system_prompt() ``` #### Smart Trimming When message count exceeds `MAX_HISTORY_MESSAGES`, the context performs smart trimming: ```python def _smart_trim(self) -> None: """ Trim messages while preserving tool call chains. OpenAI requires: assistant (with tool_calls) -> tool (with tool_call_id) We can't break this chain or the API will reject the request. """ ``` **Rules:** - Never trim an assistant message if the next message is a tool result - Never trim a tool message (it needs its preceding assistant message) - Find the first safe trim point that doesn't break chains #### Methods | Method | Description | | ------------------------ | ------------------------------ | | `add_message()` | Add a message to history | | `get_messages_for_llm()` | Format messages for OpenAI API | | `clear()` | Clear all history | ### ToolRegistry Registry for available tools. ```python class ToolRegistry: def register( self, name: str, description: str, parameters: Dict, handler: Callable[[Dict], Awaitable[Any]], ) -> None: """Register a tool with its schema and handler.""" def get_tools_schema(self) -> List[Dict]: """Get all tool schemas for LLM API.""" async def execute(self, tool_name: str, arguments: Dict, user_id: str) -> Any: """Execute a tool and return its result.""" ``` ## Data Classes ### ThinkingState ```python class ThinkingState(str, Enum): IDLE = "idle" # Waiting for input PROCESSING = "processing" # Building request TOOL_CALLING = "tool_calling" # Executing tool GENERATING = "generating" # Streaming response COMPLETE = "complete" # Finished successfully CANCELLED = "cancelled" # User interrupted ERROR = "error" # Error occurred ``` ### ConversationMessage ```python @dataclass class ConversationMessage: role: str # "user", "assistant", "system", "tool" content: str message_id: str # Auto-generated UUID timestamp: float # Unix timestamp source_mode: str # "chat" or "voice" tool_call_id: str # For tool results tool_calls: List[Dict] # For assistant messages with tool calls citations: List[Dict] # Source citations ``` ### ThinkerResponse ```python @dataclass class ThinkerResponse: text: str # Complete response text message_id: str # Unique ID citations: List[Dict] # Source citations tool_calls_made: List[str] # Names of tools called latency_ms: int # Total processing time tokens_used: int # Token count state: ThinkingState # Final state ``` ### ThinkerMetrics ```python @dataclass class ThinkerMetrics: total_tokens: int = 0 tool_calls_count: int = 0 first_token_latency_ms: int = 0 total_latency_ms: int = 0 cancelled: bool = False ``` ## Available Tools The ThinkerService automatically registers tools from the unified ToolService: | Tool | Description | Requires Auth | | ----------------------- | ------------------------- | ------------- | | `calendar_create_event` | Create calendar events | Yes | | `calendar_list_events` | List upcoming events | Yes | | `calendar_update_event` | Modify existing events | Yes | | `calendar_delete_event` | Remove events | Yes | | `web_search` | Search the web | No | | `pubmed_search` | Search medical literature | No | | `medical_calculator` | Calculate medical scores | No | | `kb_search` | Search knowledge base | No | ## System Prompt The default system prompt includes: 1. **Current Time Context**: Dynamic date/time with relative calculations 2. **Conversation Memory**: Instructions to use conversation history 3. **Tool Usage Guidelines**: When and how to use each tool 4. **Response Style**: Concise, natural, voice-optimized ```python def _default_system_prompt(self) -> str: tz = pytz.timezone("America/New_York") now = datetime.now(tz) return f"""You are VoiceAssist, a helpful AI voice assistant. CURRENT TIME CONTEXT: - Current date: {now.strftime("%A, %B %d, %Y")} - Current time: {now.strftime("%I:%M %p %Z")} CONVERSATION MEMORY: You have access to the full conversation history... AVAILABLE TOOLS: - calendar_create_event: Create events... - web_search: Search the web... ... KEY BEHAVIORS: - Keep responses concise and natural for voice - Use short sentences (max 15-20 words) - Avoid abbreviations - say "blood pressure" not "BP" """ ``` ## Usage Examples ### Basic Query Processing ```python from app.services.thinker_service import thinker_service async def handle_voice_query(conversation_id: str, transcript: str, user_id: str): # Token streaming callback async def on_token(token: str): await send_to_tts(token) # Create session with callbacks session = thinker_service.create_session( conversation_id=conversation_id, on_token=on_token, user_id=user_id, ) # Process the transcript response = await session.think(transcript, source_mode="voice") print(f"Response: {response.text}") print(f"Tools used: {response.tool_calls_made}") print(f"Latency: {response.latency_ms}ms") ``` ### With Tool Call Handling ```python async def handle_tool_call(event: ToolCallEvent): """Called when LLM decides to call a tool.""" await send_to_client({ "type": "tool.call", "tool_name": event.tool_name, "arguments": event.arguments, }) async def handle_tool_result(event: ToolResultEvent): """Called when tool execution completes.""" await send_to_client({ "type": "tool.result", "tool_name": event.tool_name, "result": event.result, }) session = thinker_service.create_session( conversation_id="conv-123", on_token=on_token, on_tool_call=handle_tool_call, on_tool_result=handle_tool_result, user_id="user-456", ) ``` ### Cancellation (Barge-in) ```python # Store session reference active_session = thinker_service.create_session(...) # When user barges in: async def handle_barge_in(): await active_session.cancel() print(f"Cancelled: {active_session.is_cancelled()}") ``` ## Context Persistence Conversation contexts are persisted across turns: ```python # Class-level storage _conversation_contexts: Dict[str, ConversationContext] = {} _context_last_access: Dict[str, float] = {} CONTEXT_TTL_SECONDS = 3600 # 1 hour TTL ``` - Contexts are automatically cleaned up after 1 hour of inactivity - Same conversation_id reuses existing context - Context persists across voice and chat modes ## Error Handling ```python try: response = await session.think(transcript) except Exception as e: # Errors are caught and returned in response response = ThinkerResponse( text=f"I apologize, but I encountered an error: {str(e)}", message_id=message_id, state=ThinkingState.ERROR, ) ``` ## Related Documentation - [Thinker-Talker Pipeline Overview](../THINKER_TALKER_PIPELINE.md) - [Talker Service](talker-service.md) - [Tool Service](../../services/api-gateway/app/services/tools/tool_service.py) 6:["slug","services/thinker-service","c"] 0:["X7oMT3VrOffzp0qvbeOas",[[["",{"children":["docs",{"children":[["slug","services/thinker-service","c"],{"children":["__PAGE__?{\"slug\":[\"services\",\"thinker-service\"]}",{}]}]}]},"$undefined","$undefined",true],["",{"children":["docs",{"children":[["slug","services/thinker-service","c"],{"children":["__PAGE__",{},[["$L1",["$","div",null,{"children":[["$","div",null,{"className":"mb-6 flex items-center justify-between gap-4","children":[["$","div",null,{"children":[["$","p",null,{"className":"text-sm text-gray-500 dark:text-gray-400","children":"Docs / Raw"}],["$","h1",null,{"className":"text-3xl font-bold text-gray-900 dark:text-white","children":"Thinker Service"}],["$","p",null,{"className":"text-sm text-gray-600 dark:text-gray-400","children":["Sourced from"," ",["$","code",null,{"className":"font-mono text-xs","children":["docs/","services/thinker-service.md"]}]]}]]}],["$","a",null,{"href":"https://github.com/mohammednazmy/VoiceAssist/edit/main/docs/services/thinker-service.md","target":"_blank","rel":"noreferrer","className":"inline-flex items-center gap-2 rounded-md border border-gray-200 dark:border-gray-700 px-3 py-1.5 text-sm text-gray-700 dark:text-gray-200 hover:border-primary-500 dark:hover:border-primary-400 hover:text-primary-700 dark:hover:text-primary-300","children":"Edit on GitHub"}]]}],["$","div",null,{"className":"rounded-lg border border-gray-200 dark:border-gray-800 bg-white dark:bg-gray-900 p-6","children":["$","$L2",null,{"content":"$3"}]}],["$","div",null,{"className":"mt-6 flex flex-wrap gap-2 text-sm","children":[["$","$L4",null,{"href":"/reference/all-docs","className":"inline-flex items-center gap-1 rounded-md bg-gray-100 px-3 py-1 text-gray-700 hover:bg-gray-200 dark:bg-gray-800 dark:text-gray-200 dark:hover:bg-gray-700","children":"← All documentation"}],["$","$L4",null,{"href":"/","className":"inline-flex items-center gap-1 rounded-md bg-gray-100 px-3 py-1 text-gray-700 hover:bg-gray-200 dark:bg-gray-800 dark:text-gray-200 dark:hover:bg-gray-700","children":"Home"}]]}]]}],null],null],null]},[null,["$","$L5",null,{"parallelRouterKey":"children","segmentPath":["children","docs","children","$6","children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L7",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":"$undefined","notFoundStyles":"$undefined"}]],null]},[null,["$","$L5",null,{"parallelRouterKey":"children","segmentPath":["children","docs","children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L7",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":"$undefined","notFoundStyles":"$undefined"}]],null]},[[[["$","link","0",{"rel":"stylesheet","href":"/_next/static/css/7f586cdbbaa33ff7.css","precedence":"next","crossOrigin":"$undefined"}]],["$","html",null,{"lang":"en","className":"h-full","children":["$","body",null,{"className":"__className_f367f3 h-full bg-white dark:bg-gray-900","children":[["$","a",null,{"href":"#main-content","className":"skip-to-content","children":"Skip to main content"}],["$","$L8",null,{"children":[["$","$L9",null,{}],["$","$La",null,{}],["$","main",null,{"id":"main-content","className":"lg:pl-64","role":"main","aria-label":"Documentation content","children":["$","$Lb",null,{"children":["$","$L5",null,{"parallelRouterKey":"children","segmentPath":["children"],"error":"$undefined","errorStyles":"$undefined","errorScripts":"$undefined","template":["$","$L7",null,{}],"templateStyles":"$undefined","templateScripts":"$undefined","notFound":[["$","title",null,{"children":"404: This page could not be found."}],["$","div",null,{"style":{"fontFamily":"system-ui,\"Segoe UI\",Roboto,Helvetica,Arial,sans-serif,\"Apple Color Emoji\",\"Segoe UI Emoji\"","height":"100vh","textAlign":"center","display":"flex","flexDirection":"column","alignItems":"center","justifyContent":"center"},"children":["$","div",null,{"children":[["$","style",null,{"dangerouslySetInnerHTML":{"__html":"body{color:#000;background:#fff;margin:0}.next-error-h1{border-right:1px solid rgba(0,0,0,.3)}@media (prefers-color-scheme:dark){body{color:#fff;background:#000}.next-error-h1{border-right:1px solid rgba(255,255,255,.3)}}"}}],["$","h1",null,{"className":"next-error-h1","style":{"display":"inline-block","margin":"0 20px 0 0","padding":"0 23px 0 0","fontSize":24,"fontWeight":500,"verticalAlign":"top","lineHeight":"49px"},"children":"404"}],["$","div",null,{"style":{"display":"inline-block"},"children":["$","h2",null,{"style":{"fontSize":14,"fontWeight":400,"lineHeight":"49px","margin":0},"children":"This page could not be found."}]}]]}]}]],"notFoundStyles":[]}]}]}]]}]]}]}]],null],null],["$Lc",null]]]] c:[["$","meta","0",{"name":"viewport","content":"width=device-width, initial-scale=1"}],["$","meta","1",{"charSet":"utf-8"}],["$","title","2",{"children":"Thinker Service | Docs | VoiceAssist Docs"}],["$","meta","3",{"name":"description","content":"Reasoning engine managing conversation context, LLM orchestration, and tool calling in the voice pipeline."}],["$","meta","4",{"name":"keywords","content":"VoiceAssist,documentation,medical AI,voice assistant,healthcare,HIPAA,API"}],["$","meta","5",{"name":"robots","content":"index, follow"}],["$","meta","6",{"name":"googlebot","content":"index, follow"}],["$","link","7",{"rel":"canonical","href":"https://assistdocs.asimo.io"}],["$","meta","8",{"property":"og:title","content":"VoiceAssist Documentation"}],["$","meta","9",{"property":"og:description","content":"Comprehensive documentation for VoiceAssist - Enterprise Medical AI Assistant"}],["$","meta","10",{"property":"og:url","content":"https://assistdocs.asimo.io"}],["$","meta","11",{"property":"og:site_name","content":"VoiceAssist Docs"}],["$","meta","12",{"property":"og:type","content":"website"}],["$","meta","13",{"name":"twitter:card","content":"summary"}],["$","meta","14",{"name":"twitter:title","content":"VoiceAssist Documentation"}],["$","meta","15",{"name":"twitter:description","content":"Comprehensive documentation for VoiceAssist - Enterprise Medical AI Assistant"}],["$","meta","16",{"name":"next-size-adjust"}]] 1:null