VAC Service Architecture

Overview

The VAC (Vector Assistant Chat) Service is the central orchestration engine for all AI assistant interactions in the Aitana platform. It handles real-time streaming responses, tool orchestration, and provides a unified interface for interacting with multiple AI models (Anthropic Claude, Google Gemini variants).

Core Architecture

Entry Point: vac_stream Function

The vac_stream function in backend/vac_service.py is the main orchestration engine that processes all assistant interactions through a sophisticated three-phase pipeline.

Function Signature:

def vac_stream(
    question: str,
    vector_name: str,
    chat_history: list = None,
    callback: BufferStreamingStdOutCallbackHandlerAsync = None,
    **kwargs
) -> Dict

Key Parameters:

  • question: User’s input question
  • vector_name: Assistant/vector database identifier
  • chat_history: Previous conversation history (optional)
  • callback: Streaming callback handler
  • **kwargs: Additional parameters including trace_id, emissaryConfig, currentUser, documents, tools_to_use, confirm_past_pause

Three-Phase Processing Pipeline

Phase 1: Tool Permission & Configuration

  • Purpose: Validate tool permissions and prepare tool configurations
  • Key Components: tool_permissions.py, tool_orchestrator.py
  • Output: Validated tool configuration and permission matrix

Flow:

  1. Extract user permissions from session/authentication
  2. Validate requested tools against user roles
  3. Apply tag-based tool filtering
  4. Generate tool configuration for Phase 2

Phase 2: First Impression & Tool Selection

  • Purpose: Generate initial AI response and determine required tools
  • Key Components: first_impression.py, smart model selection
  • Output: Initial response + tool selection + context requirements

Flow:

  1. Send user input + chat history to AI model
  2. AI model generates response and selects needed tools
  3. Extract tool confirmations if required
  4. Prepare context for Phase 3

Phase 3: Tool Execution & Smart Response

  • Purpose: Execute selected tools and generate final contextualized response
  • Key Components: tools/ directory, context aggregation, smart streaming
  • Output: Final streaming response with tool results

Flow:

  1. Execute selected tools in parallel (when possible)
  2. Aggregate tool results into context
  3. Stream final AI response with full context
  4. Handle tool confirmation workflow if needed

API Endpoints

Primary Endpoints

  • Purpose: Assistant-specific endpoint with full configuration
  • Method: POST
  • Use Case: Production applications, frontend integration

Request Format:

{
  "user_input": "What files are in my project?",
  "chat_history": [...],
  "trace_id": "optional-trace-id",
  "message_id": "unique-message-id",
  "session_id": "user-session-id"
}

/vac/streaming/<model_name> (Direct Access)

  • Purpose: Direct model access without assistant configuration
  • Method: POST
  • Use Case: Testing, debugging, custom implementations

Health Check Endpoints

/vac/health

  • Purpose: Service health and configuration status
  • Method: GET
  • Response: Service status, model availability, tool status

Tool Orchestration System

Tool Selection Process

  1. Permission Validation
    • Check user role against tool requirements
    • Apply tag-based filtering
    • Validate tool configurations
  2. Intelligent Tool Selection
    • AI model analyzes user input
    • Selects appropriate tools based on context
    • Determines execution order and dependencies
  3. Tool Execution
    • Parallel execution when possible
    • Error handling and fallback mechanisms
    • Result aggregation and context building

Supported Tools (Actual Implementation)

Available in tool_orchestrator.py:

task_mapping = {
    'file-browser': extract_from_files,
    'add_chat_histories': add_chat_histories,
    'google_search_retrieval': google_search_retrieval,
    'url_processing': url_processing,
    'vertex_search': vertex_search,
    'code_execution': code_execution,
    'document_search_agent': document_search_agent_tool
}

Special Tool Behaviors:

  • only-echo: Returns input immediately
  • suppress-tool-streaming: Disables streaming for tools
  • no-first-impression: Skips first impression, goes straight to tool execution

Model Integration

Supported AI Models

Anthropic Models:

  • claude-3-7-sonnet-20250219 (Current Default)
  • claude-3-5-sonnet-20241022 (General Use)
  • claude-sonnet-4-20250514 (Advanced Tasks)
  • claude-opus-4-20250514 (Complex Reasoning - Admin Access)

Google Models:

  • gemini-2.5-pro (Advanced Gemini)
  • gemini-2.5-flash (Fast Gemini)
  • gemini-2.5-pro-thinking (Advanced Reasoning - Beta Access)
  • gemini-2.5-flash-thinking (Fast Reasoning - Beta Access)

Model Selection Strategy

The VAC service uses intelligent model selection based on:

  • Task complexity (determined by tool requirements)
  • Response time requirements (real-time vs. batch)
  • Content type (text, code, multimodal)
  • User preferences and assistant configuration

Streaming Architecture

Real-time Response Streaming

The VAC service provides real-time streaming through WebSocket-like callback handlers:

async def callback_handler(data: Dict):
    """
    Handles streaming chunks from AI models
    Args:
        data: Dictionary containing type, content, and metadata
    """
    if data.get('type') == 'content':
        # Stream content chunk to client
        pass
    elif data.get('type') == 'thinking':
        # Handle thinking content (internal reasoning)
        pass
    elif data.get('type') == 'tool_confirmation':
        # Handle tool confirmation requests
        pass

Stream Data Types

  • content: User-visible response content
  • thinking: Internal AI reasoning (extractable)
  • tool_confirmation: Tool usage confirmation requests
  • error: Error messages and diagnostics
  • metadata: Response metadata and timing

Tool Confirmation Workflow

Pause-and-Confirm Pattern

The VAC service supports a sophisticated tool confirmation workflow:

  1. Tool Selection: AI selects tools during Phase 2
  2. Confirmation Request: System pauses and requests user approval
  3. User Decision: User approves/denies tool usage
  4. Execution: Approved tools execute in Phase 3
  5. Response: Final response incorporates tool results

Tool Confirmation Implementation

Real Configuration Structure:

  • Tool confirmation is controlled by pause_to_confirm parameter in first_impression()
  • When pause_to_confirm=True, service returns early with tool selections
  • Tool confirmation data sent as: <toolconfirmation tools_to_use='{json_tools}' />
  • Frontend can confirm by sending confirm_past_pause=True with confirmed tools

Actual Configuration Options:

# emissaryConfig structure
{
  "name": "Assistant Name",
  "tools": ["tool1", "tool2"],
  "toolConfigs": {"tool_name": {"parameter": "value"}},
  "selectedItems": [...],
  "currentUser": {...}
}

Error Handling & Resilience

Error Recovery Mechanisms

  1. Model Fallback: Automatic fallback to alternative models
  2. Tool Degradation: Graceful handling of tool failures
  3. Partial Response: Return partial results when possible
  4. Retry Logic: Exponential backoff for transient failures

Error Types

  • Authentication Errors: Invalid session/permissions
  • Model Errors: AI model failures or timeouts
  • Tool Errors: Tool execution failures
  • Configuration Errors: Invalid assistant configurations

Performance Optimization

Caching Strategy

  • Tool Results: Cache expensive tool operations
  • Model Responses: Cache responses for identical inputs
  • Assistant Configs: Cache validated configurations
  • Permission Matrix: Cache permission calculations

Parallel Processing

  • Tool Execution: Execute independent tools in parallel
  • Model Requests: Parallel requests to multiple models
  • Context Building: Asynchronous context aggregation

Integration Points

Frontend Integration

The VAC service integrates with the frontend through:

  • Emissary Components: Primary UI components
  • Streaming Context: Real-time response handling
  • Tool Confirmation UI: Interactive confirmation dialogs
  • Error Boundaries: Graceful error handling

External Service Integration

  • Firebase: Authentication, message persistence
  • Langfuse: Tracing and observability
  • Google Cloud: Model hosting and storage
  • Twilio: WhatsApp integration

Debugging & Observability

Tracing

Each request is traced through the entire pipeline with:

  • Trace ID: Unique identifier for request tracking
  • Phase Timing: Performance metrics for each phase
  • Tool Execution: Individual tool performance
  • Model Interactions: AI model request/response timing

Logging

Comprehensive logging includes:

  • Request/Response: Full API interactions
  • Tool Execution: Tool selection and results
  • Error Context: Detailed error information
  • Performance Metrics: Timing and resource usage

Security Considerations

Authentication & Authorization

  • User Authentication: Firebase-based user authentication
  • Permission Validation: Tag-based tool access control
  • Session Management: Secure session handling
  • Tool Isolation: Sandboxed tool execution

Data Protection

  • Input Sanitization: Validate all user inputs
  • Output Filtering: Remove sensitive information
  • Audit Logging: Track all tool usage
  • Encryption: Secure data transmission

Configuration Management

Actual Assistant Configuration

Real emissaryConfig Structure:

{
  "name": "Assistant Name",
  "tools": ["tool1", "tool2"],  # List of enabled tools
  "toolConfigs": {  # Nested configuration
    "tool_name": {
      "parameter1": "value1",
      "parameter2": "value2"
    }
  },
  "selectedItems": [...],  # File browser selections
  "currentUser": {...},    # User context
  "admin_email": "admin@domain.com"
}

Environment Variables

Backend Environment Variables (Cloud Run):

Required:

  • GOOGLE_CLOUD_PROJECT: GCP project ID
  • GOOGLE_CLOUD_LOCATION: GCP region
  • ANTHROPIC_API_KEY: Anthropic API key

Optional:

  • LANGFUSE_SECRET_KEY: Tracing configuration
  • FIREBASE_BUCKET: Storage bucket name
  • VAC_DEBUG: Enable debug logging

Frontend Environment Variables:

Local Development (.env.local):

NEXT_PUBLIC_BACKEND_URL=http://127.0.0.1:1956
NEXT_PUBLIC_LANGFUSE_PUBLIC_KEY=your-langfuse-public-key
NEXT_PUBLIC_LANGFUSE_BASE_URL=https://cloud.langfuse.com
# ... other NEXT_PUBLIC_ variables

Production Deployment: Frontend environment variables must be managed via GCP Secret Manager:

  1. Download current secrets:
    gcloud secrets versions access latest --secret=FIREBASE_ENV --project YOUR_PROJECT_ID > .env.local
    
  2. Add/update variables in .env.local

  3. Upload back to Secret Manager:
    gcloud secrets versions add FIREBASE_ENV --data-file=.env.local --project YOUR_PROJECT_ID
    
  4. Redeploy frontend to pick up new secrets

The get-firebase-config.sh script automatically downloads these secrets during Cloud Build deployment.

Best Practices

For Developers

  1. Always use trace IDs for request tracking
  2. Handle streaming gracefully with proper error boundaries
  3. Implement proper timeouts for all API calls
  4. Use tool confirmation for potentially destructive operations
  5. Monitor performance through Langfuse integration

For Administrators

  1. Configure tool permissions based on user roles
  2. Monitor usage patterns through tracing
  3. Set appropriate timeouts for your use case
  4. Regularly update model configurations
  5. Monitor error rates and performance metrics

Future Enhancements

Planned Improvements

  • Multi-turn tool execution: Complex workflows with tool chaining
  • Custom model integration: Support for additional AI providers
  • Advanced caching: Intelligent context caching
  • Load balancing: Multiple model instance support
  • Real-time collaboration: Multi-user sessions