Google Search Tools

Overview

The Google Search Tools provide comprehensive web search capabilities for AI assistants, integrating Google’s search API with Vertex AI’s Gemini models to deliver rich, grounded search results. These tools enable AI assistants to access real-time web information, format search results for display, and provide users with cited sources and confidence scores.

Location: backend/tools/google_search.py

Core Functions

1. create_google_search_component_string()

Converts Gemini API responses containing search grounding metadata into React component strings for frontend display.

Signature:

def create_google_search_component_string(gemini_response: GenerateContentResponse) -> str

Purpose:

  • Extracts search entry point content from Gemini responses
  • Formats grounding metadata into structured JSON
  • Creates React component markup with search results
  • Applies CSS class name replacements for styling compatibility

Key Features:

  • Source Attribution: Extracts URI and title information from grounding chunks
  • Confidence Scores: Includes confidence ratings for search result segments
  • Search Query Tracking: Captures the original search queries used
  • CSS Safety: Removes style tags and applies namespaced CSS classes

Example Output:

<div class="google-search-container">
  <!-- Search entry point HTML -->
</div>

<googlesearch
  sources='[{"uri": "https://example.com", "title": "Example Title"}]'
  segments='[{"text": "Search result text", "confidence": 0.95, "sources": ["https://example.com"]}]'
  queries='["search query"]'
/>

2. create_google_search_markdown()

Generates markdown-formatted search results that group sources with their findings for clean display in chat interfaces.

Signature:

def create_google_search_markdown(gemini_response: GenerateContentResponse) -> str

Purpose:

  • Creates human-readable search result summaries
  • Groups findings by source for better organization
  • Includes confidence percentages for transparency
  • Adds Google branding with inline SVG logo

Example Output:

## Google Search Results
🔍 Google

- *[Example Title](https://example.com)*
  Climate change is a pressing global issue affecting millions.
  *(Confidence: 95.3%)*

- *[Another Source](https://example2.com)*
  Renewable energy solutions are becoming more cost-effective.
  *(Confidence: 87.1%)*

3. google_search_retrieval()

Orchestrates multiple parallel search queries using the AsyncTaskRunner for efficient bulk searching.

Signature:

async def google_search_retrieval(
    input_list_dict: List[Dict], 
    fallback_question: str = "",
    callback=None,
    trace=None, 
    parent_observation_id=None
) -> str

Parameters:

  • input_list_dict: List of search configurations with query parameters
  • fallback_question: Default query if no specific query provided
  • callback: Streaming callback for real-time updates
  • trace: Langfuse trace object for observability
  • parent_observation_id: Parent observation ID for nested tracking

Key Features:

  • Parallel Execution: Runs multiple searches simultaneously
  • Error Resilience: Continues processing even if individual searches fail
  • Streaming Support: Provides real-time updates through callbacks
  • Comprehensive Logging: Full observability through Langfuse integration

4. google_search_retrieval_one()

Executes a single search query using Gemini’s grounding capabilities with comprehensive error handling.

Signature:

async def google_search_retrieval_one(
    query: str, 
    trace: Optional[StatefulTraceClient] = None, 
    parent_observation_id=None
) -> str

Purpose:

  • Performs individual search queries
  • Integrates with model tools for search execution
  • Provides detailed error handling and logging
  • Returns formatted search results

System Instruction:

“You are an Aitana assistant dedicated to providing the best search results to answer people’s questions”

5. google_search_old() (Legacy)

Legacy implementation maintained for backward compatibility using Vertex AI GenerativeModel directly.

Signature:

async def google_search_old(
    question: str, 
    config: ConfigManager, 
    trace: Optional[StatefulTraceClient] = None, 
    parent_observation_id=None
) -> Dict[str, str]

Returns: Dictionary containing:

  • markdown_text: Formatted search results
  • search_entry_point: Raw HTML entry point content

Integration with Tool Orchestrator

Registration

The Google Search tools are integrated into the Tool Orchestrator System under the google_search_retrieval tool name:

# In tool orchestrator
'google_search_retrieval': google_search_retrieval

Configuration Options

Tool Configuration Example:

toolConfigs = {
    'google_search_retrieval': {
        'max_results': 10,
        'safe_search': True,
        'region': 'US',
        'language': 'en'
    }
}

AI Tool Selection Example:

first_response_tools = [
    {
        'name': 'google_search_retrieval',
        'config': [
            {'parameter': 'query', 'value': 'latest climate research 2024'},
            {'parameter': 'hardcode-query', 'value': 'renewable energy trends'}
        ]
    }
]

Anthropic Web Tools Integration

Complementary Capabilities

The Google Search tools work alongside Anthropic’s Claude web browsing capabilities to provide comprehensive web research:

Claude’s Web Browsing:

  • Real-time web page content extraction
  • Interactive web navigation
  • Dynamic content handling
  • JavaScript-rendered page support

Google Search Tools:

  • Structured search result aggregation
  • Multiple source comparison
  • Confidence-scored information
  • Search query optimization

Integration Patterns

1. Search-Then-Browse Pattern

# AI workflow example
first_response_tools = [
    {
        'name': 'google_search_retrieval',
        'config': [{'parameter': 'query', 'value': 'best machine learning frameworks 2024'}]
    }
    # Claude can then use web browsing to dive deeper into specific results
]

2. Parallel Research Pattern

# Combine multiple search approaches
research_tools = [
    {'name': 'google_search_retrieval', 'config': [{'parameter': 'query', 'value': 'topic overview'}]},
    # Claude web browsing for specific authoritative sources
    # Vertex AI search for internal documentation
]

3. Verification Pattern

  • Use Google Search for initial information gathering
  • Use Claude web browsing to verify claims by visiting original sources
  • Cross-reference findings for accuracy

Best Practices for Combined Usage

When to Use Google Search Tools:

  • Initial topic exploration
  • Finding recent news and updates
  • Gathering multiple perspectives
  • Quick fact-checking

When to Use Claude Web Browsing:

  • Deep diving into specific sources
  • Accessing paywalled or registration-required content
  • Navigating complex websites
  • Real-time data extraction

When to Use Both:

  • Comprehensive research projects
  • Fact verification workflows
  • Content creation with multiple sources
  • Academic or professional research

Frontend Integration

React Component Integration

The Google Search tools generate custom React components that can be rendered in the chat interface:

Component Structure:

interface GoogleSearchProps {
  sources: Array<{uri: string, title: string}>;
  segments: Array<{
    text: string, 
    confidence: number, 
    sources: string[]
  }>;
  queries: string[];
}

Styling and Display

CSS Classes Applied:

  • .google-search-container: Main container styling
  • .google-search-chip: Search query chips
  • .google-search-carousel: Result carousel display
  • .google-search-headline: Result headlines
  • .google-search-gradient-container: Visual effects

User Experience Features

  • Visual Source Attribution: Clear links to original sources
  • Confidence Indicators: Percentage confidence for each finding
  • Search Query Display: Shows what queries were actually executed
  • Responsive Design: Adapts to different screen sizes

Configuration and Deployment

Environment Variables

Required environment variables for Google Search integration:

# Google Cloud Project Configuration
GOOGLE_CLOUD_PROJECT=your-project-id
GOOGLE_CLOUD_LOCATION=us-central1

# Vertex AI Configuration
GOOGLE_GENAI_USE_VERTEXAI=true

# Search API Configuration (if using custom search)
GOOGLE_SEARCH_API_KEY=your-search-api-key
GOOGLE_SEARCH_ENGINE_ID=your-search-engine-id

Model Configuration

Gemini Model Settings:

gen_config = types.GenerateContentConfig(
    system_instruction="You are an Aitana assistant dedicated to providing the best search results",
    tools=tools,
    max_output_tokens=8192,
)

Error Handling and Resilience

Common Error Scenarios

1. API Rate Limiting

# Automatic retry with exponential backoff
try:
    response = await call_gemini_async(contents, gen_config=gen_config)
except RateLimitError:
    # Handled by AsyncTaskRunner retry mechanism
    pass

2. No Search Results

if not grounding_metadata:
    log.info("No grounding metadata found")
    return ''  # Graceful degradation

3. Malformed Responses

except Exception as e:
    log.error(f"Error creating Google Search component: {str(e)}")
    return f'<googlesearch error={json.dumps(str(e))} />'

Monitoring and Observability

Langfuse Integration:

  • Search query tracking
  • Response time monitoring
  • Error rate analysis
  • Token usage tracking

Logging Levels:

  • INFO: Successful searches and metadata extraction
  • WARN: Partial failures or fallback scenarios
  • ERROR: Complete failures with full traceback

Testing and Development

Unit Testing

Test Structure:

# backend/tests/tools/test_google_search.py
class TestGoogleSearch:
    async def test_search_component_creation(self):
        # Test component string generation
        pass
    
    async def test_markdown_generation(self):
        # Test markdown formatting
        pass
    
    async def test_parallel_search_execution(self):
        # Test multi-query handling
        pass

Integration Testing

API Integration:

async def test_live_search():
    result = await google_search_retrieval_one("test query")
    assert "Google was queried" in result
    assert "found this text" in result

Performance Testing

Metrics to Monitor:

  • Search response time (target: <2 seconds)
  • Parallel execution efficiency
  • Memory usage during large searches
  • Error recovery time

Usage Examples

Basic Search Query

# Single search execution
result = await google_search_retrieval_one(
    query="climate change solutions 2024",
    trace=trace_obj
)
print(result)
# Output: "Google was queried with climate change solutions 2024 and found this text: <google_search_query>...</google_search_query>"

Multiple Parallel Searches

# Multiple search queries
search_configs = [
    {'query': 'renewable energy trends'},
    {'query': 'solar panel efficiency 2024'},
    {'hardcode-query': 'wind power innovations'}
]

results = await google_search_retrieval(
    input_list_dict=search_configs,
    fallback_question="energy technologies",
    trace=trace_obj
)
print(results)  # Combined results from all searches

Component Generation

# Generate React component from search response
gemini_response = await call_gemini_async(contents, gen_config=gen_config)
component_string = create_google_search_component_string(gemini_response)

# Result can be directly rendered in React frontend
print(component_string)
# Output: HTML + <googlesearch> component with JSON props

Markdown Formatting

# Generate markdown summary
markdown_results = create_google_search_markdown(gemini_response)
print(markdown_results)
# Output: Formatted markdown with sources and confidence scores

Security Considerations

Input Validation

  • Search queries are sanitized before execution
  • Parameter validation prevents injection attacks
  • Rate limiting prevents abuse

Output Sanitization

  • HTML content is stripped of dangerous elements
  • CSS classes are namespaced to prevent conflicts
  • JSON output is properly escaped

API Security

  • Google Cloud authentication via service accounts
  • Environment variable configuration for sensitive data
  • Audit logging through Langfuse integration

Performance Optimization

Caching Strategies

  • Search results can be cached by query hash
  • Component strings cached for repeated queries
  • Markdown formatting cached separately

Parallel Processing

  • Multiple searches execute simultaneously via AsyncTaskRunner
  • No blocking between independent search queries
  • Efficient resource utilization

Resource Management

  • Memory-efficient streaming for large results
  • Automatic garbage collection of completed tasks
  • Configurable timeout handling

Troubleshooting

Common Issues

1. No Search Results Returned

# Check logs for:
log.info("No grounding metadata found")
log.info("No search_entry_point found")

2. Component Rendering Issues

  • Verify CSS class name replacements
  • Check JSON prop formatting
  • Ensure React component registration

3. API Authentication Errors

  • Verify GOOGLE_CLOUD_PROJECT environment variable
  • Check service account permissions
  • Validate Vertex AI API enablement

Debug Mode

Enable detailed logging:

import logging
logging.getLogger('google_search').setLevel(logging.DEBUG)

Performance Issues

  • Monitor Langfuse traces for bottlenecks
  • Check AsyncTaskRunner execution times
  • Verify network connectivity to Google APIs

Core Integration

Search and AI Tools

Frontend Components

Development and Testing

Anthropic Integration Resources