Multilingual RAG Architecture

The multilingual RAG service enables voice interactions in multiple languages by implementing a translate-then-retrieve pattern with graceful degradation.

Architecture Overview

┌─────────────────────────────────────────────────────────────────┐
│                     Multilingual RAG Pipeline                    │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ┌──────────┐   ┌─────────────┐   ┌───────────────┐             │
│  │  User    │──▶│  Language   │──▶│  Translation  │             │
│  │  Query   │   │  Detection  │   │  (if needed)  │             │
│  └──────────┘   └─────────────┘   └───────────────┘             │
│                        │                   │                     │
│                        ▼                   ▼                     │
│              ┌─────────────────────────────────┐                │
│              │    English Query for RAG        │                │
│              └─────────────────────────────────┘                │
│                              │                                   │
│                              ▼                                   │
│              ┌─────────────────────────────────┐                │
│              │      RAG Knowledge Base         │                │
│              │   (English embeddings only)     │                │
│              └─────────────────────────────────┘                │
│                              │                                   │
│                              ▼                                   │
│              ┌─────────────────────────────────┐                │
│              │      LLM Response Generation    │                │
│              │   (with language instruction)   │                │
│              └─────────────────────────────────┘                │
│                              │                                   │
│                              ▼                                   │
│              ┌─────────────────────────────────┐                │
│              │     Response in User Language   │                │
│              └─────────────────────────────────┘                │
└─────────────────────────────────────────────────────────────────┘

Translation Service

Multi-Provider Fallback

from app.services.translation_service import TranslationService

# Initialize with providers
service = TranslationService(
    primary_provider="google",
    fallback_provider="deepl"
)

# Translate with automatic fallback
result = await service.translate_with_fallback(
    text="¿Cuáles son los síntomas de la diabetes?",
    source="es",
    target="en"
)

if result.failed:
    # Graceful degradation - use original query
    print(f"Translation failed: {result.error_message}")
else:
    print(f"Translated: {result.text}")
    if result.used_fallback:
        print("Used fallback provider")

Caching Strategy

Translations are cached in Redis with a 7-day TTL:

# Cache key format
cache_key = f"trans:{source}:{target}:{hash(text)}"

# TTL
TTL_DAYS = 7

# Cache hit rate typically >80% for common queries

Supported Languages

Code	Language	Status
en	English	Native
es	Spanish	Full
fr	French	Full
de	German	Full
it	Italian	Full
pt	Portuguese	Full
ar	Arabic	Full
zh	Chinese	Full
hi	Hindi	Full
ur	Urdu	Full
ja	Japanese	Placeholder
ko	Korean	Placeholder
ru	Russian	Placeholder
pl	Polish	Placeholder
tr	Turkish	Placeholder

Language Detection

Code-Switching Detection

The language detection service identifies when users mix languages:

from app.services.multilingual_rag_service import LanguageDetectionService

detector = LanguageDetectionService()

# Detect primary language
result = await detector.detect("Tell me about مرض السكري please")
# result.primary_language = "en"
# result.secondary_languages = ["ar"]
# result.is_code_switched = True

Detection Algorithm

Fast detection: Use langdetect for initial guess
Confidence check: Verify confidence > 0.7
Code-switching scan: Check for embedded phrases in other languages
Fallback: Default to user's preferred language

RAG Integration

Query Flow

from app.services.multilingual_rag_service import MultilingualRAGService

service = MultilingualRAGService()

response = await service.query_multilingual(
    query="¿Qué medicamentos se usan para la diabetes?",
    user_language="es"
)

# Response structure
{
    "answer": "Los medicamentos más comunes para...",
    "language": "es",
    "sources": [...],
    "original_query": "¿Qué medicamentos se usan para...",
    "translated_query": "What medications are used for...",
    "translation_warning": None,  # or "Translation used fallback"
    "latency_ms": 523.4,
    "degradation_applied": []
}

LLM Prompting for Multilingual Response

The LLM is instructed to respond in the user's language:

system_prompt = f"""You are a helpful medical assistant.
Respond to the user's question using the provided context.
IMPORTANT: Respond entirely in {language_name}.
Do not mix languages unless the user's query contains specific terms
that should remain in their original language (e.g., medication names).
Be accurate, helpful, and cite your sources when providing information."""

Graceful Degradation

When translation fails, the system degrades gracefully:

Degradation Levels

Scenario	Action	DegradationType
Primary translation fails	Use fallback provider	`translation_used_fallback`
All translation fails	Use original query + LLM	`translation_failed`
Translation too slow	Skip translation	`translation_budget_exceeded`
RAG retrieval fails	Return empty results	`rag_retrieval_failed`

Error Messages by Language

FALLBACK_MESSAGES = {
    "en": "I apologize, but I'm unable to process your request. Please try again.",
    "es": "Lo siento, no puedo procesar su solicitud. Por favor, inténtelo de nuevo.",
    "fr": "Je m'excuse, je ne peux pas traiter votre demande. Veuillez réessayer.",
    "de": "Es tut mir leid, ich kann Ihre Anfrage nicht bearbeiten. Bitte versuchen Sie es erneut.",
    "ar": "عذراً، لا أستطيع معالجة طلبك. يرجى المحاولة مرة أخرى.",
    "zh": "抱歉，我目前无法处理您的请求。请重试。",
    # ... more languages
}

Performance Considerations

Latency Impact

Stage	Typical Latency	Budget
Language detection	10-30ms	50ms
Translation	100-180ms	200ms
RAG retrieval	150-250ms	300ms
Total impact	~300ms	550ms

Optimization Strategies

Translation caching: 7-day Redis cache
Async detection: Run language detection in parallel with audio processing
Skip translation for English: Detect English early and bypass translation
Budget-aware skipping: Skip translation when budget is tight

Configuration

Environment Variables

# Primary translation provider
TRANSLATION_PROVIDER=google

# API keys (store in secrets manager)
GOOGLE_TRANSLATE_API_KEY=xxx
DEEPL_API_KEY=xxx

# Cache settings
TRANSLATION_CACHE_TTL_DAYS=7
TRANSLATION_CACHE_PREFIX=trans

# Feature flag
VOICE_V4_MULTILINGUAL_RAG=true

Feature Flag

from app.core.feature_flags import is_enabled

if is_enabled("voice_v4_multilingual_rag", user_id=user.id):
    service = MultilingualRAGService()
    response = await service.query_multilingual(query, user_language)
else:
    # Fall back to English-only RAG
    response = await rag_service.query(query)

Testing

# Test translation fallback
pytest tests/services/test_voice_v4_services.py::TestTranslationFailureHandling -v

# Test multilingual RAG
pytest tests/services/test_voice_v4_services.py::TestMultilingualRAG -v

Multilingual RAG Architecture

Multilingual RAG Architecture

Architecture Overview

Translation Service

Multi-Provider Fallback

Caching Strategy

Supported Languages

Language Detection

Code-Switching Detection

Detection Algorithm

RAG Integration

Query Flow

LLM Prompting for Multilingual Response

Graceful Degradation

Degradation Levels

Error Messages by Language

Performance Considerations

Latency Impact

Optimization Strategies

Configuration

Environment Variables

Feature Flag

Testing

Related Documentation