VAC Service Architecture

Overview

The VAC (Vector Assistant Chat) Service is the central orchestration engine for all AI assistant interactions in the Aitana platform. It handles real-time streaming responses, tool orchestration, and provides a unified interface for interacting with multiple AI models (Anthropic Claude, Google Gemini variants).

Core Architecture

Entry Point: `vac_stream` Function

The vac_stream function in backend/vac_service.py is the main orchestration engine that processes all assistant interactions through a sophisticated three-phase pipeline.

Function Signature:

def vac_stream(
    question: str,
    vector_name: str,
    chat_history: list = None,
    callback: BufferStreamingStdOutCallbackHandlerAsync = None,
    **kwargs
) -> Dict

Key Parameters:

question: User’s input question
vector_name: Assistant/vector database identifier
chat_history: Previous conversation history (optional)
callback: Streaming callback handler
**kwargs: Additional parameters including trace_id, emissaryConfig, currentUser, documents, tools_to_use, confirm_past_pause

Three-Phase Processing Pipeline

Phase 1: Tool Permission & Configuration

Purpose: Validate tool permissions and prepare tool configurations
Key Components: tool_permissions.py, tool_orchestrator.py
Output: Validated tool configuration and permission matrix

Flow:

Extract user permissions from session/authentication
Validate requested tools against user roles
Apply tag-based tool filtering
Generate tool configuration for Phase 2

Phase 2: First Impression & Tool Selection

Purpose: Generate initial AI response and determine required tools
Key Components: first_impression.py, smart model selection
Output: Initial response + tool selection + context requirements

Flow:

Send user input + chat history to AI model
AI model generates response and selects needed tools
Extract tool confirmations if required
Prepare context for Phase 3

Phase 3: Tool Execution & Smart Response

Purpose: Execute selected tools and generate final contextualized response
Key Components: tools/ directory, context aggregation, smart streaming
Output: Final streaming response with tool results

Flow:

Execute selected tools in parallel (when possible)
Aggregate tool results into context
Stream final AI response with full context
Handle tool confirmation workflow if needed

API Endpoints

Primary Endpoints

`/vac/assistant/<assistant_id>` (Recommended)

Purpose: Assistant-specific endpoint with full configuration
Method: POST
Use Case: Production applications, frontend integration

Request Format:

{
  "user_input": "What files are in my project?",
  "chat_history": [...],
  "trace_id": "optional-trace-id",
  "message_id": "unique-message-id",
  "session_id": "user-session-id"
}

`/vac/streaming/<model_name>` (Direct Access)

Purpose: Direct model access without assistant configuration
Method: POST
Use Case: Testing, debugging, custom implementations

Health Check Endpoints

`/vac/health`

Purpose: Service health and configuration status
Method: GET
Response: Service status, model availability, tool status

Tool Orchestration System

Tool Selection Process

Permission Validation
- Check user role against tool requirements
- Apply tag-based filtering
- Validate tool configurations
Intelligent Tool Selection
- AI model analyzes user input
- Selects appropriate tools based on context
- Determines execution order and dependencies
Tool Execution
- Parallel execution when possible
- Error handling and fallback mechanisms
- Result aggregation and context building

Supported Tools (Actual Implementation)

Available in tool_orchestrator.py:

task_mapping = {
    'file-browser': extract_from_files,
    'add_chat_histories': add_chat_histories,
    'google_search_retrieval': google_search_retrieval,
    'url_processing': url_processing,
    'vertex_search': vertex_search,
    'code_execution': code_execution,
    'document_search_agent': document_search_agent_tool
}

Special Tool Behaviors:

only-echo: Returns input immediately
suppress-tool-streaming: Disables streaming for tools
no-first-impression: Skips first impression, goes straight to tool execution

Model Integration

Supported AI Models

Anthropic Models:

claude-3-7-sonnet-20250219 (Current Default)
claude-3-5-sonnet-20241022 (General Use)
claude-sonnet-4-20250514 (Advanced Tasks)
claude-opus-4-20250514 (Complex Reasoning - Admin Access)

Google Models:

gemini-2.5-pro (Advanced Gemini)
gemini-2.5-flash (Fast Gemini)
gemini-2.5-pro-thinking (Advanced Reasoning - Beta Access)
gemini-2.5-flash-thinking (Fast Reasoning - Beta Access)

Model Selection Strategy

The VAC service uses intelligent model selection based on:

Task complexity (determined by tool requirements)
Response time requirements (real-time vs. batch)
Content type (text, code, multimodal)
User preferences and assistant configuration

Streaming Architecture

Real-time Response Streaming

The VAC service provides real-time streaming through WebSocket-like callback handlers:

async def callback_handler(data: Dict):
    """
    Handles streaming chunks from AI models
    Args:
        data: Dictionary containing type, content, and metadata
    """
    if data.get('type') == 'content':
        # Stream content chunk to client
        pass
    elif data.get('type') == 'thinking':
        # Handle thinking content (internal reasoning)
        pass
    elif data.get('type') == 'tool_confirmation':
        # Handle tool confirmation requests
        pass

Stream Data Types

content: User-visible response content
thinking: Internal AI reasoning (extractable)
tool_confirmation: Tool usage confirmation requests
error: Error messages and diagnostics
metadata: Response metadata and timing

Tool Confirmation Workflow

Pause-and-Confirm Pattern

The VAC service supports a sophisticated tool confirmation workflow:

Tool Selection: AI selects tools during Phase 2
Confirmation Request: System pauses and requests user approval
User Decision: User approves/denies tool usage
Execution: Approved tools execute in Phase 3
Response: Final response incorporates tool results

Tool Confirmation Implementation

Real Configuration Structure:

Tool confirmation is controlled by pause_to_confirm parameter in first_impression()
When pause_to_confirm=True, service returns early with tool selections
Tool confirmation data sent as: <toolconfirmation tools_to_use='{json_tools}' />
Frontend can confirm by sending confirm_past_pause=True with confirmed tools

Actual Configuration Options:

# emissaryConfig structure
{
  "name": "Assistant Name",
  "tools": ["tool1", "tool2"],
  "toolConfigs": {"tool_name": {"parameter": "value"}},
  "selectedItems": [...],
  "currentUser": {...}
}

Error Handling & Resilience

Error Recovery Mechanisms

Model Fallback: Automatic fallback to alternative models
Tool Degradation: Graceful handling of tool failures
Partial Response: Return partial results when possible
Retry Logic: Exponential backoff for transient failures

Error Types

Authentication Errors: Invalid session/permissions
Model Errors: AI model failures or timeouts
Tool Errors: Tool execution failures
Configuration Errors: Invalid assistant configurations

Performance Optimization

Caching Strategy

Tool Results: Cache expensive tool operations
Model Responses: Cache responses for identical inputs
Assistant Configs: Cache validated configurations
Permission Matrix: Cache permission calculations

Parallel Processing

Tool Execution: Execute independent tools in parallel
Model Requests: Parallel requests to multiple models
Context Building: Asynchronous context aggregation

Integration Points

Frontend Integration

The VAC service integrates with the frontend through:

Emissary Components: Primary UI components
Streaming Context: Real-time response handling
Tool Confirmation UI: Interactive confirmation dialogs
Error Boundaries: Graceful error handling

External Service Integration

Firebase: Authentication, message persistence
Langfuse: Tracing and observability
Google Cloud: Model hosting and storage
Twilio: WhatsApp integration

Debugging & Observability

Tracing

Each request is traced through the entire pipeline with:

Trace ID: Unique identifier for request tracking
Phase Timing: Performance metrics for each phase
Tool Execution: Individual tool performance
Model Interactions: AI model request/response timing

Logging

Comprehensive logging includes:

Request/Response: Full API interactions
Tool Execution: Tool selection and results
Error Context: Detailed error information
Performance Metrics: Timing and resource usage

Security Considerations

Authentication & Authorization

User Authentication: Firebase-based user authentication
Permission Validation: Tag-based tool access control
Session Management: Secure session handling
Tool Isolation: Sandboxed tool execution

Data Protection

Input Sanitization: Validate all user inputs
Output Filtering: Remove sensitive information
Audit Logging: Track all tool usage
Encryption: Secure data transmission

Configuration Management

Actual Assistant Configuration

Real emissaryConfig Structure:

{
  "name": "Assistant Name",
  "tools": ["tool1", "tool2"],  # List of enabled tools
  "toolConfigs": {  # Nested configuration
    "tool_name": {
      "parameter1": "value1",
      "parameter2": "value2"
    }
  },
  "selectedItems": [...],  # File browser selections
  "currentUser": {...},    # User context
  "admin_email": "admin@domain.com"
}

Environment Variables

Backend Environment Variables (Cloud Run):

Required:

GOOGLE_CLOUD_PROJECT: GCP project ID
GOOGLE_CLOUD_LOCATION: GCP region
ANTHROPIC_API_KEY: Anthropic API key

Optional:

LANGFUSE_SECRET_KEY: Tracing configuration
FIREBASE_BUCKET: Storage bucket name
VAC_DEBUG: Enable debug logging

Frontend Environment Variables:

Local Development (.env.local):

NEXT_PUBLIC_BACKEND_URL=http://127.0.0.1:1956
NEXT_PUBLIC_LANGFUSE_PUBLIC_KEY=your-langfuse-public-key
NEXT_PUBLIC_LANGFUSE_BASE_URL=https://cloud.langfuse.com
# ... other NEXT_PUBLIC_ variables

Production Deployment: Frontend environment variables must be managed via GCP Secret Manager:

Download current secrets:

gcloud secrets versions access latest --secret=FIREBASE_ENV --project YOUR_PROJECT_ID > .env.local

Add/update variables in .env.local

Upload back to Secret Manager:

gcloud secrets versions add FIREBASE_ENV --data-file=.env.local --project YOUR_PROJECT_ID

Redeploy frontend to pick up new secrets

The get-firebase-config.sh script automatically downloads these secrets during Cloud Build deployment.

Best Practices

For Developers

Always use trace IDs for request tracking
Handle streaming gracefully with proper error boundaries
Implement proper timeouts for all API calls
Use tool confirmation for potentially destructive operations
Monitor performance through Langfuse integration

For Administrators

Configure tool permissions based on user roles
Monitor usage patterns through tracing
Set appropriate timeouts for your use case
Regularly update model configurations
Monitor error rates and performance metrics

Future Enhancements

Planned Improvements

Multi-turn tool execution: Complex workflows with tool chaining
Custom model integration: Support for additional AI providers
Advanced caching: Intelligent context caching
Load balancing: Multiple model instance support
Real-time collaboration: Multi-user sessions

VAC Service Architecture

Overview

Core Architecture

Entry Point: vac_stream Function

Three-Phase Processing Pipeline

Phase 1: Tool Permission & Configuration

Phase 2: First Impression & Tool Selection

Phase 3: Tool Execution & Smart Response

API Endpoints

Primary Endpoints

/vac/assistant/<assistant_id> (Recommended)

/vac/streaming/<model_name> (Direct Access)

Health Check Endpoints

/vac/health

Tool Orchestration System

Tool Selection Process

Supported Tools (Actual Implementation)

Model Integration

Supported AI Models

Model Selection Strategy

Streaming Architecture

Real-time Response Streaming

Stream Data Types

Tool Confirmation Workflow

Pause-and-Confirm Pattern

Tool Confirmation Implementation

Error Handling & Resilience

Error Recovery Mechanisms

Error Types

Performance Optimization

Caching Strategy

Parallel Processing

Integration Points

Frontend Integration

External Service Integration

Debugging & Observability

Tracing

Logging

Security Considerations

Authentication & Authorization

Data Protection

Configuration Management

Actual Assistant Configuration

Environment Variables

Best Practices

For Developers

For Administrators

Future Enhancements

Planned Improvements

Entry Point: `vac_stream` Function

`/vac/assistant/<assistant_id>` (Recommended)

`/vac/streaming/<model_name>` (Direct Access)

`/vac/health`