VAC Service Architecture
Overview
The VAC (Vector Assistant Chat) Service is the central orchestration engine for all AI assistant interactions in the Aitana platform. It handles real-time streaming responses, tool orchestration, and provides a unified interface for interacting with multiple AI models (Anthropic Claude, Google Gemini variants).
Core Architecture
Entry Point: vac_stream Function
The vac_stream function in backend/vac_service.py is the main orchestration engine that processes all assistant interactions through a sophisticated three-phase pipeline.
Function Signature:
def vac_stream(
question: str,
vector_name: str,
chat_history: list = None,
callback: BufferStreamingStdOutCallbackHandlerAsync = None,
**kwargs
) -> Dict
Key Parameters:
question: User’s input questionvector_name: Assistant/vector database identifierchat_history: Previous conversation history (optional)callback: Streaming callback handler**kwargs: Additional parameters includingtrace_id,emissaryConfig,currentUser,documents,tools_to_use,confirm_past_pause
Three-Phase Processing Pipeline
Phase 1: Tool Permission & Configuration
- Purpose: Validate tool permissions and prepare tool configurations
- Key Components:
tool_permissions.py,tool_orchestrator.py - Output: Validated tool configuration and permission matrix
Flow:
- Extract user permissions from session/authentication
- Validate requested tools against user roles
- Apply tag-based tool filtering
- Generate tool configuration for Phase 2
Phase 2: First Impression & Tool Selection
- Purpose: Generate initial AI response and determine required tools
- Key Components:
first_impression.py, smart model selection - Output: Initial response + tool selection + context requirements
Flow:
- Send user input + chat history to AI model
- AI model generates response and selects needed tools
- Extract tool confirmations if required
- Prepare context for Phase 3
Phase 3: Tool Execution & Smart Response
- Purpose: Execute selected tools and generate final contextualized response
- Key Components:
tools/directory, context aggregation, smart streaming - Output: Final streaming response with tool results
Flow:
- Execute selected tools in parallel (when possible)
- Aggregate tool results into context
- Stream final AI response with full context
- Handle tool confirmation workflow if needed
API Endpoints
Primary Endpoints
/vac/assistant/<assistant_id> (Recommended)
- Purpose: Assistant-specific endpoint with full configuration
- Method: POST
- Use Case: Production applications, frontend integration
Request Format:
{
"user_input": "What files are in my project?",
"chat_history": [...],
"trace_id": "optional-trace-id",
"message_id": "unique-message-id",
"session_id": "user-session-id"
}
/vac/streaming/<model_name> (Direct Access)
- Purpose: Direct model access without assistant configuration
- Method: POST
- Use Case: Testing, debugging, custom implementations
Health Check Endpoints
/vac/health
- Purpose: Service health and configuration status
- Method: GET
- Response: Service status, model availability, tool status
Tool Orchestration System
Tool Selection Process
- Permission Validation
- Check user role against tool requirements
- Apply tag-based filtering
- Validate tool configurations
- Intelligent Tool Selection
- AI model analyzes user input
- Selects appropriate tools based on context
- Determines execution order and dependencies
- Tool Execution
- Parallel execution when possible
- Error handling and fallback mechanisms
- Result aggregation and context building
Supported Tools (Actual Implementation)
Available in tool_orchestrator.py:
task_mapping = {
'file-browser': extract_from_files,
'add_chat_histories': add_chat_histories,
'google_search_retrieval': google_search_retrieval,
'url_processing': url_processing,
'vertex_search': vertex_search,
'code_execution': code_execution,
'document_search_agent': document_search_agent_tool
}
Special Tool Behaviors:
only-echo: Returns input immediatelysuppress-tool-streaming: Disables streaming for toolsno-first-impression: Skips first impression, goes straight to tool execution
Model Integration
Supported AI Models
Anthropic Models:
claude-3-7-sonnet-20250219(Current Default)claude-3-5-sonnet-20241022(General Use)claude-sonnet-4-20250514(Advanced Tasks)claude-opus-4-20250514(Complex Reasoning - Admin Access)
Google Models:
gemini-2.5-pro(Advanced Gemini)gemini-2.5-flash(Fast Gemini)gemini-2.5-pro-thinking(Advanced Reasoning - Beta Access)gemini-2.5-flash-thinking(Fast Reasoning - Beta Access)
Model Selection Strategy
The VAC service uses intelligent model selection based on:
- Task complexity (determined by tool requirements)
- Response time requirements (real-time vs. batch)
- Content type (text, code, multimodal)
- User preferences and assistant configuration
Streaming Architecture
Real-time Response Streaming
The VAC service provides real-time streaming through WebSocket-like callback handlers:
async def callback_handler(data: Dict):
"""
Handles streaming chunks from AI models
Args:
data: Dictionary containing type, content, and metadata
"""
if data.get('type') == 'content':
# Stream content chunk to client
pass
elif data.get('type') == 'thinking':
# Handle thinking content (internal reasoning)
pass
elif data.get('type') == 'tool_confirmation':
# Handle tool confirmation requests
pass
Stream Data Types
content: User-visible response contentthinking: Internal AI reasoning (extractable)tool_confirmation: Tool usage confirmation requestserror: Error messages and diagnosticsmetadata: Response metadata and timing
Tool Confirmation Workflow
Pause-and-Confirm Pattern
The VAC service supports a sophisticated tool confirmation workflow:
- Tool Selection: AI selects tools during Phase 2
- Confirmation Request: System pauses and requests user approval
- User Decision: User approves/denies tool usage
- Execution: Approved tools execute in Phase 3
- Response: Final response incorporates tool results
Tool Confirmation Implementation
Real Configuration Structure:
- Tool confirmation is controlled by
pause_to_confirmparameter infirst_impression() - When
pause_to_confirm=True, service returns early with tool selections - Tool confirmation data sent as:
<toolconfirmation tools_to_use='{json_tools}' /> - Frontend can confirm by sending
confirm_past_pause=Truewith confirmed tools
Actual Configuration Options:
# emissaryConfig structure
{
"name": "Assistant Name",
"tools": ["tool1", "tool2"],
"toolConfigs": {"tool_name": {"parameter": "value"}},
"selectedItems": [...],
"currentUser": {...}
}
Error Handling & Resilience
Error Recovery Mechanisms
- Model Fallback: Automatic fallback to alternative models
- Tool Degradation: Graceful handling of tool failures
- Partial Response: Return partial results when possible
- Retry Logic: Exponential backoff for transient failures
Error Types
- Authentication Errors: Invalid session/permissions
- Model Errors: AI model failures or timeouts
- Tool Errors: Tool execution failures
- Configuration Errors: Invalid assistant configurations
Performance Optimization
Caching Strategy
- Tool Results: Cache expensive tool operations
- Model Responses: Cache responses for identical inputs
- Assistant Configs: Cache validated configurations
- Permission Matrix: Cache permission calculations
Parallel Processing
- Tool Execution: Execute independent tools in parallel
- Model Requests: Parallel requests to multiple models
- Context Building: Asynchronous context aggregation
Integration Points
Frontend Integration
The VAC service integrates with the frontend through:
- Emissary Components: Primary UI components
- Streaming Context: Real-time response handling
- Tool Confirmation UI: Interactive confirmation dialogs
- Error Boundaries: Graceful error handling
External Service Integration
- Firebase: Authentication, message persistence
- Langfuse: Tracing and observability
- Google Cloud: Model hosting and storage
- Twilio: WhatsApp integration
Debugging & Observability
Tracing
Each request is traced through the entire pipeline with:
- Trace ID: Unique identifier for request tracking
- Phase Timing: Performance metrics for each phase
- Tool Execution: Individual tool performance
- Model Interactions: AI model request/response timing
Logging
Comprehensive logging includes:
- Request/Response: Full API interactions
- Tool Execution: Tool selection and results
- Error Context: Detailed error information
- Performance Metrics: Timing and resource usage
Security Considerations
Authentication & Authorization
- User Authentication: Firebase-based user authentication
- Permission Validation: Tag-based tool access control
- Session Management: Secure session handling
- Tool Isolation: Sandboxed tool execution
Data Protection
- Input Sanitization: Validate all user inputs
- Output Filtering: Remove sensitive information
- Audit Logging: Track all tool usage
- Encryption: Secure data transmission
Configuration Management
Actual Assistant Configuration
Real emissaryConfig Structure:
{
"name": "Assistant Name",
"tools": ["tool1", "tool2"], # List of enabled tools
"toolConfigs": { # Nested configuration
"tool_name": {
"parameter1": "value1",
"parameter2": "value2"
}
},
"selectedItems": [...], # File browser selections
"currentUser": {...}, # User context
"admin_email": "admin@domain.com"
}
Environment Variables
Backend Environment Variables (Cloud Run):
Required:
GOOGLE_CLOUD_PROJECT: GCP project IDGOOGLE_CLOUD_LOCATION: GCP regionANTHROPIC_API_KEY: Anthropic API key
Optional:
LANGFUSE_SECRET_KEY: Tracing configurationFIREBASE_BUCKET: Storage bucket nameVAC_DEBUG: Enable debug logging
Frontend Environment Variables:
Local Development (.env.local):
NEXT_PUBLIC_BACKEND_URL=http://127.0.0.1:1956
NEXT_PUBLIC_LANGFUSE_PUBLIC_KEY=your-langfuse-public-key
NEXT_PUBLIC_LANGFUSE_BASE_URL=https://cloud.langfuse.com
# ... other NEXT_PUBLIC_ variables
Production Deployment: Frontend environment variables must be managed via GCP Secret Manager:
- Download current secrets:
gcloud secrets versions access latest --secret=FIREBASE_ENV --project YOUR_PROJECT_ID > .env.local -
Add/update variables in
.env.local - Upload back to Secret Manager:
gcloud secrets versions add FIREBASE_ENV --data-file=.env.local --project YOUR_PROJECT_ID - Redeploy frontend to pick up new secrets
The get-firebase-config.sh script automatically downloads these secrets during Cloud Build deployment.
Best Practices
For Developers
- Always use trace IDs for request tracking
- Handle streaming gracefully with proper error boundaries
- Implement proper timeouts for all API calls
- Use tool confirmation for potentially destructive operations
- Monitor performance through Langfuse integration
For Administrators
- Configure tool permissions based on user roles
- Monitor usage patterns through tracing
- Set appropriate timeouts for your use case
- Regularly update model configurations
- Monitor error rates and performance metrics
Future Enhancements
Planned Improvements
- Multi-turn tool execution: Complex workflows with tool chaining
- Custom model integration: Support for additional AI providers
- Advanced caching: Intelligent context caching
- Load balancing: Multiple model instance support
- Real-time collaboration: Multi-user sessions