Email Integration Testing Guide

Comprehensive testing strategies for the email integration system, covering unit tests, integration tests, and real-world scenarios.

Overview

This guide covers testing approaches for the email integration system based on the comprehensive test suite in backend/tests/test_email_integration.py and backend/tests/test_my_utils.py. The testing strategy ensures reliability across webhook processing, rate limiting, file handling, and AI integration.

Key Testing Areas:

  • Email webhook processing and validation
  • Rate limiting and security measures
  • Content processing and formatting
  • File attachment handling
  • Integration with VAC Service and Tool Context
  • Error handling and edge cases

Test Architecture

Test Structure

backend/tests/
├── test_email_integration.py      # Email system tests
├── test_my_utils.py               # Utility function tests  
├── test_tools_model_tools.py      # Tool creation tests
└── test_tools_tool_prompts.py     # Tool prompt tests

Testing Frameworks

Backend Testing:

  • pytest - Primary testing framework
  • pytest-asyncio - Async test support
  • unittest.mock - Mocking and patching
  • pytest fixtures - Test setup and teardown

Key Imports:

import pytest
from unittest.mock import Mock, patch, AsyncMock
from datetime import datetime, timedelta
import json

Email Integration Tests

EmailRateLimiter Testing

Test Class: TestEmailRateLimiter

Rate Limiting Behavior

First Email Allowed:

@pytest.mark.asyncio
async def test_rate_limit_allows_first_email(self, rate_limiter):
    """Test that first email from user is allowed."""
    # Mock Firestore document that doesn't exist
    mock_doc = Mock()
    mock_doc.exists = False
    mock_doc_ref = Mock()
    mock_doc_ref.get.return_value = mock_doc
    mock_doc_ref.set = Mock()
    
    rate_limiter.db.collection.return_value.document.return_value = mock_doc_ref
    
    allowed, error = await rate_limiter.check_rate_limit("user@example.com")
    
    assert allowed is True
    assert error is None
    mock_doc_ref.set.assert_called_once()

Rapid Email Blocking:

@pytest.mark.asyncio
async def test_rate_limit_blocks_rapid_emails(self, rate_limiter):
    """Test that rapid emails are blocked."""
    # Mock Firestore document with recent timestamp
    mock_doc = Mock()
    mock_doc.exists = True
    mock_doc.to_dict.return_value = {
        'last_email_time': datetime.utcnow() - timedelta(seconds=30)
    }
    
    allowed, error = await rate_limiter.check_rate_limit("user@example.com")
    
    assert allowed is False
    assert "Rate limit exceeded" in error

Timeout Recovery:

@pytest.mark.asyncio
async def test_rate_limit_allows_after_timeout(self, rate_limiter):
    """Test that emails are allowed after rate limit timeout."""
    mock_doc = Mock()
    mock_doc.exists = True
    mock_doc.to_dict.return_value = {
        'last_email_time': datetime.utcnow() - timedelta(minutes=2)  # Old timestamp
    }
    
    allowed, error = await rate_limiter.check_rate_limit("user@example.com")
    
    assert allowed is True
    assert error is None

EmailWebhookValidator Testing

Test Class: TestEmailWebhookValidator

Email Address Validation

Valid Email Patterns:

def test_validate_email_address_valid(self, validator):
    """Test validation of valid email addresses."""
    valid_emails = [
        "user@example.com",
        "test.user+tag@domain.co.uk", 
        "name@subdomain.example.org"
    ]
    
    for email in valid_emails:
        assert validator.validate_email_address(email) is True

Invalid Email Patterns:

def test_validate_email_address_invalid(self, validator):
    """Test validation of invalid email addresses."""
    invalid_emails = [
        "not-an-email",
        "@example.com",     # Missing local part
        "user@",            # Missing domain
        "user@domain",      # Domain without TLD
        ""                  # Empty string
    ]
    
    for email in invalid_emails:
        assert validator.validate_email_address(email) is False

Webhook Signature Validation

HMAC Validation with Mocking:

@patch('email_integration.hmac.compare_digest')
@patch('email_integration.hmac.new')
def test_validate_mailgun_webhook_with_secret(self, mock_hmac_new, mock_compare, validator):
    """Test webhook validation with proper HMAC checking."""
    mock_hmac_new.return_value.hexdigest.return_value = "expected_signature"
    mock_compare.return_value = True
    
    result = validator.validate_mailgun_webhook("123", "token", "signature")
    
    assert result is True
    mock_compare.assert_called_once_with("signature", "expected_signature")

EmailProcessor Testing

Test Class: TestEmailProcessor

Assistant ID Extraction

Valid Extraction Patterns:

def test_extract_assistant_id_valid(self, processor):
    """Test extraction of assistant ID from valid email addresses."""
    test_cases = [
        ("assistant-123@example.com", "123"),
        ("assistant-test-id@domain.org", "test-id"),
        ("assistant-uuid-123-456@mail.com", "uuid-123-456"),
        ("assistant-list@example.com", "list"),  # Special case
        ("dev-assistant-list@example.com", "list")  # Special case with environment prefix
    ]
    
    for email, expected_id in test_cases:
        result = processor.extract_assistant_id(email)
        assert result == expected_id

Invalid Pattern Handling:

def test_extract_assistant_id_invalid(self, processor):
    """Test extraction of assistant ID from invalid email addresses."""
    invalid_emails = [
        "user@example.com",              # Not assistant email
        "assistant@example.com",         # Missing ID
        "not-assistant-123@example.com", # Wrong prefix
        ""                               # Empty string
    ]
    
    for email in invalid_emails:
        result = processor.extract_assistant_id(email)
        assert result is None

Content Cleaning

Email Content Processing:

def test_clean_email_content(self, processor):
    """Test email content cleaning functionality."""
    raw_email = """Hello,

This is my question about the project.

Thanks!
John

From: john@example.com
Sent from my iPhone
--
This is a signature
> This is quoted content
> More quoted content"""
    
    cleaned = processor.clean_email_content(raw_email)
    
    # Should remove signatures, headers, and quoted content
    assert "From:" not in cleaned
    assert "Sent from" not in cleaned
    assert "--" not in cleaned
    assert "> This is quoted" not in cleaned
    assert "This is my question about the project." in cleaned

Thinking Tag Removal

Comprehensive Thinking Tag Testing:

def test_clean_assistant_response(self, processor):
    """Test assistant response cleaning removes thinking tags."""
    
    # Basic thinking tag removal
    response_with_thinking = """<thinking>
This is internal reasoning that users shouldn't see in email.
Let me think about this...
</thinking>

Here is my actual response to the user.

This should be visible in the email."""
    
    cleaned = processor.clean_assistant_response(response_with_thinking)
    assert "<thinking>" not in cleaned
    assert "internal reasoning" not in cleaned
    assert "Here is my actual response" in cleaned
    assert "This should be visible" in cleaned
    
    # Case insensitive removal
    case_mixed = """<THINKING>
Mixed case thinking
</thinking>

User visible content."""
    
    cleaned_mixed = processor.clean_assistant_response(case_mixed)
    assert "Mixed case thinking" not in cleaned_mixed
    assert "User visible content" in cleaned_mixed
    
    # Multiple thinking blocks
    multiple_thinking = """First part.

<thinking>First internal thought</thinking>

Middle part.

<thinking>
Second internal thought
spanning multiple lines
</thinking>

Final part."""
    
    cleaned_multiple = processor.clean_assistant_response(multiple_thinking)
    assert "First internal thought" not in cleaned_multiple
    assert "Second internal thought" not in cleaned_multiple
    assert "First part." in cleaned_multiple
    assert "Middle part." in cleaned_multiple
    assert "Final part." in cleaned_multiple

Integration Testing

Successful Email Processing:

@pytest.mark.asyncio
async def test_process_email_message_success(self, processor):
    """Test successful email processing."""
    mock_config = {"name": "Test Assistant", "tools": ["search"], "toolConfigs": {}}
    mock_ai_response = "This is the AI response"
    
    with patch('email_integration.get_assistant_config', return_value=mock_config), \
         patch('email_integration.permitted_tools', return_value=(["search"], {})), \
         patch('email_integration.vac_stream', return_value={"metadata": {"answer": mock_ai_response}}) as mock_vac_stream, \
         patch.object(processor, '_store_email_interaction') as mock_store, \
         patch.object(processor, '_save_email_to_chat_history') as mock_save_chat, \
         patch.object(processor, '_send_email_response', return_value=True) as mock_send:
        
        result = await processor.process_email_message(
            "user@example.com", "test-assistant", "Test message", "Test Subject"
        )
        
        assert result["status"] == "processed"
        assert result["response"] == mock_ai_response
        assert result["email_sent"] is True
        mock_vac_stream.assert_called_once()
        mock_store.assert_called_once()
        mock_send.assert_called_once()

Webhook Endpoint Testing

Invalid Signature Handling

@pytest.mark.asyncio
async def test_email_webhook_receive_invalid_signature():
    """Test webhook receives with invalid signatures."""
    from flask import Flask
    
    app = Flask(__name__)
    
    with app.test_request_context('/api/email/webhook', method='POST', data={
        'timestamp': '123',
        'token': 'token',
        'signature': 'invalid'
    }), patch('email_integration.email_processor') as mock_processor:
        
        mock_processor.validator.validate_mailgun_webhook.return_value = False
        
        response, status_code = await email_webhook_receive()
        
        assert status_code == 401
        assert "Invalid signature" in response.get_json()["error"]

Rate Limiting Testing

@pytest.mark.asyncio
async def test_email_webhook_receive_rate_limited():
    """Test webhook receives when rate limited."""
    with app.test_request_context('/api/email/webhook', method='POST', data={
        'sender': 'user@example.com',
        'recipient': 'assistant-123@example.com',
        'subject': 'Test',
        'body-plain': 'Test message'
    }), patch('email_integration.email_processor') as mock_processor:
        
        mock_processor.rate_limiter.check_rate_limit = AsyncMock(
            return_value=(False, "Rate limit exceeded")
        )
        
        response, status_code = await email_webhook_receive()
        
        assert status_code == 429
        assert "Rate limit exceeded" in response.get_json()["error"]

Utility Function Tests

File Sanitization Testing

Test Class: Tests in test_my_utils.py

Filename Sanitization

Special Character Handling:

def test_sanitize_file_with_special_chars():
    """Test sanitizing filenames with special characters"""
    assert sanitize_file("My-File_Name!.txt") == "my-file-name"
    
def test_sanitize_file_with_consecutive_dashes():
    """Test sanitizing filenames with consecutive dashes"""
    assert sanitize_file("my--file---name.txt") == "my-file-name"
    
def test_sanitize_file_with_leading_trailing_dashes():
    """Test sanitizing filenames with leading/trailing dashes"""
    assert sanitize_file("-myfile-.txt") == "myfile"
    
def test_sanitize_file_empty_after_sanitization():
    """Test sanitizing filenames that become empty after sanitization"""
    assert sanitize_file("!!!.txt") == "file"
    
def test_sanitize_file_length_limit():
    """Test sanitizing filenames with length exceeding the limit"""
    long_name = "a" * 50 + ".txt"
    assert len(sanitize_file(long_name)) <= 40

Content Processing Testing

Thinking Tag Processing

Comprehensive Tag Removal:

def test_strip_thinking_tags_multiple():
    """Test stripping multiple thinking blocks"""
    text = """First part.

<thinking>First internal thought</thinking>

Middle part.

<thinking>
Second internal thought
spanning multiple lines
</thinking>

Final part."""
    
    result = strip_thinking_tags(text)
    assert "First internal thought" not in result
    assert "Second internal thought" not in result
    assert "First part." in result
    assert "Middle part." in result
    assert "Final part." in result

Case Insensitive Processing:

def test_strip_thinking_tags_case_insensitive():
    """Test stripping thinking tags with mixed case"""
    text = "<THINKING>Internal reasoning</thinking>\n\nUser visible content"
    expected = "User visible content"
    assert strip_thinking_tags(text) == expected

Chat History Formatting

Multiple Message Processing:

def test_format_human_chat_history_multiple_messages():
    """Test formatting chat history with multiple messages"""
    history = [
        {"name": "user", "content": "Hello"},
        {"name": "assistant", "content": "Hi there"},
        {"name": "user", "content": "How are you?"}
    ]
    expected = "user: Hello\nassistant: Hi there\nuser: How are you?"
    assert format_human_chat_history(history) == expected

Async Function Testing

Callback Processing

Mock Callback Testing:

class MockCallback:
    """Mock callback class for testing"""
    def __init__(self):
        self.tokens = []
        
    async def async_on_llm_new_token(self, token):
        self.tokens.append(token)

@pytest.mark.asyncio
async def test_check_and_display_thinking_with_thinking_tags():
    """Test processing message with thinking tags"""
    with patch('my_utils.log.info'):  # Patch to avoid log.info call
        callback = MockCallback()
        await check_and_display_thinking("test <thinking>thought</thinking>", callback)
        assert "<&#8203thinking>" in callback.tokens[0]
        assert "thought" in callback.tokens[0]

Error Handling:

@pytest.mark.asyncio
async def test_check_and_display_thinking_no_callback():
    """Test processing message with no callback"""
    with patch('my_utils.log.error') as mock_log_error:
        await check_and_display_thinking("test message", None)
        mock_log_error.assert_called_once()
        args, _ = mock_log_error.call_args
        assert "No callback" in args[0]

Tool System Testing

Model Tools Testing

Test Class: TestCreateModelTools in test_tools_model_tools.py

Tool Creation Validation

Google Search Tool:

def test_create_model_tools_google_search_retrieval(self):
    """Test creating Google Search tool."""
    tools = ["google_search_retrieval"]
    result = create_model_tools(tools)
    
    assert len(result) == 1
    assert isinstance(result[0], types.Tool)
    assert hasattr(result[0], 'google_search')

Multiple Tool Processing:

def test_create_model_tools_multiple_tools(self):
    """Test creating multiple tools."""
    tools = ["google_search_retrieval", "code_execution", "url_processing"]
    result = create_model_tools(tools)
    
    assert len(result) == 3
    assert all(isinstance(tool, types.Tool) for tool in result)

Unknown Tool Handling:

def test_create_model_tools_unknown_tool(self):
    """Test with unknown tool - should be ignored."""
    tools = ["unknown_tool", "google_search_retrieval"]
    result = create_model_tools(tools)
    
    # Should only create the known tool
    assert len(result) == 1
    assert hasattr(result[0], 'google_search')

Tool Prompts Testing

Test Class: TestAddToolPrompts in test_tools_tool_prompts.py

Prompt Loading and Formatting

String Tools Processing:

@patch('tools.tool_prompts.langfuse')
def test_add_tool_prompts_string_tools(self, mock_langfuse):
    """Test with tools as strings."""
    mock_prompt = MagicMock()
    mock_prompt.compile.return_value = "Tool prompt content"
    mock_langfuse.get_prompt.return_value = mock_prompt
    
    tools = ["tool1", "tool2"]
    result = add_tool_prompts(tools)
    
    assert "Tool prompt content" in result
    assert mock_langfuse.get_prompt.call_count == 2

Error Handling:

@patch('tools.tool_prompts.langfuse')
@patch('tools.tool_prompts.log')
def test_add_tool_prompts_error_handling(self, mock_log, mock_langfuse):
    """Test error handling when prompt loading fails."""
    mock_langfuse.get_prompt.side_effect = Exception("Prompt not found")
    
    tools = ["tool1", "tool2"]
    result = add_tool_prompts(tools)
    
    # Should return empty string when all prompts fail
    assert result == ""
    
    # Should log warnings for each failed tool
    assert mock_log.warning.call_count == 2

Integration Testing Strategies

Email Processing Pipeline

Full Pipeline Test:

@pytest.mark.asyncio
async def test_complete_email_processing():
    """Test complete email processing from webhook to response."""
    
    # Setup test data
    test_email_data = {
        'sender': 'user@example.com',
        'recipient': 'assistant-123@email.aitana.chat',
        'subject': 'Test question',
        'body-plain': 'How do I create a React component?'
    }
    
    # Mock dependencies
    with patch('email_integration.get_assistant_config') as mock_config, \
         patch('email_integration.vac_stream') as mock_vac, \
         patch('email_integration.send_email_response') as mock_send:
        
        # Configure mocks
        mock_config.return_value = {"name": "Test Assistant", "tools": []}
        mock_vac.return_value = {"metadata": {"answer": "Test response"}}
        mock_send.return_value = True
        
        # Process email
        processor = EmailProcessor()
        result = await processor.process_email_message(
            test_email_data['sender'],
            '123',
            test_email_data['body-plain'], 
            test_email_data['subject']
        )
        
        # Verify processing
        assert result['status'] == 'processed'
        assert result['email_sent'] is True
        mock_vac.assert_called_once()
        mock_send.assert_called_once()

Error Recovery Testing

Graceful Degradation:

@pytest.mark.asyncio
async def test_email_processing_with_failures():
    """Test email processing handles partial failures gracefully."""
    
    with patch('email_integration.get_assistant_config') as mock_config, \
         patch('email_integration.vac_stream') as mock_vac, \
         patch('email_integration.QuartoExporter.generate_export') as mock_export:
        
        # Configure successful AI response but failed export
        mock_config.return_value = {"name": "Test Assistant"}
        mock_vac.return_value = {"metadata": {"answer": "Test response"}}
        mock_export.return_value = None  # Export fails
        
        processor = EmailProcessor()
        result = await processor.process_email_message(
            'user@example.com',
            '123', 
            'Test message (export:pdf)',  # Request export
            'Test subject'
        )
        
        # Should still succeed despite export failure
        assert result['status'] == 'processed'
        assert result['email_sent'] is True

Performance Testing

Load Testing Strategies

Rate Limiting Under Load:

@pytest.mark.asyncio
async def test_rate_limiting_concurrent_requests():
    """Test rate limiting with concurrent requests."""
    
    rate_limiter = EmailRateLimiter()
    user_email = "loadtest@example.com"
    
    # Simulate concurrent requests
    tasks = []
    for _ in range(10):
        task = rate_limiter.check_rate_limit(user_email)
        tasks.append(task)
    
    results = await asyncio.gather(*tasks)
    
    # Only first request should be allowed
    allowed_count = sum(1 for allowed, _ in results if allowed)
    assert allowed_count == 1

Memory Usage Testing:

def test_large_content_processing():
    """Test processing of large email content."""
    
    # Generate large content
    large_content = "x" * 100000  # 100KB content
    
    processor = EmailProcessor() 
    cleaned = processor.clean_email_content(large_content)
    
    # Should handle large content without memory issues
    assert len(cleaned) > 0
    assert isinstance(cleaned, str)

Test Environment Setup

Fixture Configuration

Rate Limiter Fixture:

@pytest.fixture
def rate_limiter():
    with patch('email_integration.firestore.Client'):
        return EmailRateLimiter()

Validator Fixture:

@pytest.fixture
def validator():
    with patch.dict('os.environ', {'MAILGUN_WEBHOOK_SECRET': 'test-secret'}):
        return EmailWebhookValidator()

Processor Fixture:

@pytest.fixture
def processor():
    with patch('email_integration.firestore.Client'):
        return EmailProcessor()

Mock Configuration

Firestore Mocking:

def setup_firestore_mocks():
    """Setup comprehensive Firestore mocking."""
    mock_db = Mock()
    mock_collection = Mock()
    mock_document = Mock()
    mock_doc_ref = Mock()
    
    mock_db.collection.return_value = mock_collection
    mock_collection.document.return_value = mock_doc_ref
    mock_doc_ref.get.return_value = mock_document
    
    return mock_db, mock_collection, mock_document, mock_doc_ref

VAC Service Mocking:

@patch('email_integration.vac_stream')
def mock_vac_service(mock_vac):
    """Mock VAC service responses."""
    mock_vac.return_value = {
        "metadata": {
            "answer": "Mocked AI response",
            "trace_id": "test-trace-123"
        }
    }
    return mock_vac

Running Tests

Local Development

Run Email Integration Tests:

# All email integration tests
cd backend && python -m pytest tests/test_email_integration.py -v

# Specific test class
python -m pytest tests/test_email_integration.py::TestEmailRateLimiter -v

# Specific test method
python -m pytest tests/test_email_integration.py::TestEmailProcessor::test_clean_email_content -v

Run Utility Function Tests:

# All utility tests
python -m pytest tests/test_my_utils.py -v

# With coverage
python -m pytest tests/test_my_utils.py --cov=my_utils --cov-report=html

Run Tool System Tests:

# Model tools tests
python -m pytest tests/test_tools_model_tools.py -v

# Tool prompts tests  
python -m pytest tests/test_tools_tool_prompts.py -v

CI/CD Integration

GitHub Actions Test Command:

cd backend && python -m pytest tests/ -v --tb=short --cov=. --cov-report=json

Coverage Requirements:

  • Email integration: >90% coverage
  • Utility functions: >95% coverage
  • Tool system: >85% coverage

Test Data Management

Sample Email Data

Valid Email Payload:

VALID_EMAIL_PAYLOAD = {
    'timestamp': '1234567890',
    'token': 'test_token',
    'signature': 'valid_signature',
    'sender': 'user@example.com',
    'recipient': 'assistant-123@email.aitana.chat',
    'subject': 'Test Question',
    'body-plain': 'How do I implement authentication?',
    'attachment-count': '0'
}

Rate Limit Test Data:

RATE_LIMIT_DATA = {
    'recent_email': datetime.utcnow() - timedelta(seconds=30),
    'old_email': datetime.utcnow() - timedelta(minutes=5),
    'user_email': 'ratelimit@example.com'
}

Mock Response Templates

Assistant Config Mock:

MOCK_ASSISTANT_CONFIG = {
    'name': 'Test Assistant',
    'avatar': '/avatars/test.png',
    'tools': ['search', 'code_execution'],
    'toolConfigs': {'search': {'enabled': True}},
    'initialInstructions': 'You are a helpful assistant.'
}

VAC Service Response Mock:

MOCK_VAC_RESPONSE = {
    'metadata': {
        'answer': 'This is a test AI response with helpful information.',
        'trace_id': 'trace_123456',
        'model_used': 'claude-3-sonnet',
        'processing_time': 2.5
    }
}