Code Execution Agent

Overview

The Code Execution Agent is an AI-powered system that generates, tests, and iteratively improves Python code based on natural language instructions. It implements a complete test-driven development workflow with self-healing capabilities.

Architecture

Core Components

AICodeGenerator Class (backend/agents/code_execution_agent.py)

  • Main entry point for the code generation system
  • Manages the complete workflow from instruction to working implementation
  • Implements caching for successful solutions

Workflow Pipeline

The agent follows a 5-step process:

  1. Function Specification Generation - Analyzes instructions and defines required functions
  2. Test Case Generation - Creates comprehensive test cases for each function
  3. Implementation Generation - Generates initial code implementations
  4. Test Execution - Runs tests and validates implementations
  5. Iterative Improvement - Recursively improves failed implementations

Key Features

Self-Evolving Code Generation

  • Automatic function specification: Determines needed functions and their interfaces
  • Dependency resolution: Sorts functions by dependencies for correct implementation order
  • Test-driven development: Generates tests before implementations
  • Iterative improvement: Automatically fixes failing implementations

Intelligent Testing System

  • Comprehensive test coverage: Normal cases, edge cases, and error cases
  • Safe execution environment: Isolated test execution with cleanup
  • Output comparison: Handles complex data types (lists, dictionaries, objects)
  • Execution safety: Timeout protection and error handling

Caching and Performance

  • Solution caching: Stores successful implementations for reuse
  • Cache key generation: MD5-based hashing of instructions
  • Performance optimization: Avoids redundant computation

Usage Examples

Basic Usage

from backend.agents.code_execution_agent import AICodeGenerator

# Initialize with API key
generator = AICodeGenerator("your_api_key_here")

# Generate solution from natural language
instruction = """
Create a function that finds all prime numbers up to a given limit 
using the Sieve of Eratosthenes algorithm.
"""

solution = generator.generate_solution(instruction)

# Check results
print(f"All tests passed: {solution['all_tests_passed']}")
print(f"Functions generated: {len(solution['implementations'])}")

Solution Structure

The generate_solution() method returns a comprehensive result:

{
    "instruction": "Original instruction text",
    "function_specs": [
        {
            "name": "function_name",
            "purpose": "What this function does",
            "inputs": [{"name": "param", "type": "str"}],
            "outputs": {"type": "list", "description": "Return value"},
            "dependencies": ["other_function"],
            "complexity": "simple|medium|complex"
        }
    ],
    "implementations": {
        "function_name": "def function_name(param):\n    # implementation"
    },
    "tests": {
        "function_name": [
            {
                "name": "test_case_description",
                "inputs": {"param": "test_value"},
                "expected_output": "expected_result",
                "setup": "optional_setup_code"
            }
        ]
    },
    "test_results": [
        {
            "function": "function_name",
            "test": "test_case_name",
            "passed": True,
            "actual_output": "result"
        }
    ],
    "all_tests_passed": True
}

Implementation Details

Function Specification Process

The agent uses AI to analyze instructions and generate structured specifications:

def _generate_function_specs(self, instruction: str) -> List[Dict[str, Any]]:
    # Uses AI to determine required functions
    # Returns specifications with inputs, outputs, dependencies

Test Generation Strategy

Each function gets comprehensive test coverage:

def _generate_tests(self, function_specs, instruction) -> Dict[str, List[Dict]]:
    # Creates normal cases, edge cases, error cases
    # Includes setup code when needed

Dependency Resolution

Functions are implemented in correct order:

def _sort_by_dependencies(self, function_specs) -> List[Dict]:
    # Topological sort to handle dependencies
    # Detects circular dependencies

Iterative Improvement

Failed implementations are automatically improved:

def _improve_implementations(self, instruction, specs, implementations, tests, results):
    # Analyzes failure patterns
    # Generates improved implementations
    # Recursively improves until success or limit reached

Error Handling

Execution Safety

  • Timeout protection: Commands timeout after reasonable periods
  • Safe imports: Controlled module loading
  • Cleanup: Temporary files are automatically removed
  • Exception handling: Comprehensive error catching and reporting

Failure Recovery

  • Alternative approaches: Tries different implementation strategies
  • Gradual improvement: Makes incremental fixes to failing code
  • Detailed diagnostics: Provides specific error information for debugging

Configuration

Initialization Parameters

AICodeGenerator(
    api_key="your_api_key",      # AI service API key
    cache_dir=".code_cache"      # Directory for caching solutions
)

Environment Requirements

  • Python 3.7+
  • Write access for temporary files
  • Network access for AI API calls
  • Required packages: requests, hashlib, json

API Integration

The agent is designed to work with various AI services. The current implementation includes:

  • Generic API format: Configurable endpoint and headers
  • Request/response handling: JSON-based communication
  • Error handling: Graceful degradation on API failures
  • Rate limiting: Built-in request management

API Configuration

# Configure headers for AI service
self.headers = {
    "Content-Type": "application/json",
    "x-api-key": api_key
}

# API call with error handling
response = requests.post(
    "https://api.example.com/v1/completion",
    headers=self.headers,
    json={"prompt": prompt, "max_tokens": 2000},
    timeout=30
)

Performance Considerations

Caching Strategy

  • Cache hits: Instant return for previously solved problems
  • Cache misses: Full generation pipeline with caching of result
  • Cache invalidation: Manual clearing when needed

Optimization Techniques

  • Batch processing: Handles multiple functions efficiently
  • Smart dependency ordering: Minimizes redundant work
  • Progressive complexity: Simple functions first, complex ones using dependencies

Integration with Backend

The Code Execution Agent integrates with the main backend system:

File Location: backend/agents/code_execution_agent.py Integration Points:

  • Backend tool system can invoke the agent
  • Results can be formatted for chat responses
  • Caching works across different user sessions

Security Considerations

Code Execution Safety

  • Isolated execution: Code runs in controlled environment
  • Import restrictions: Limited to safe modules
  • Timeout enforcement: Prevents infinite loops
  • File system protection: Controlled file access

Input Validation

  • Instruction sanitization: Validates natural language inputs
  • Code validation: Checks generated code before execution
  • Test case validation: Ensures test cases are safe to run

Future Enhancements

Planned Features

  • Multi-language support: Beyond Python to JavaScript, Java, etc.
  • Advanced caching: Semantic similarity-based cache lookup
  • Performance profiling: Code efficiency analysis
  • Security scanning: Automated vulnerability detection

Integration Opportunities

  • IDE integration: Real-time code assistance
  • Continuous integration: Automated test generation for CI/CD
  • Documentation generation: Automatic code documentation
  • Code review assistance: Suggestions for code improvements

Troubleshooting

Common Issues

API Connection Failures

# Check API key and endpoint configuration
# Verify network connectivity
# Review request/response format

Test Execution Errors

# Check Python environment
# Verify import statements
# Review temporary file permissions

Cache Issues

# Clear cache directory
# Check file permissions
# Verify disk space

Debug Mode

Enable detailed logging by modifying the agent:

# Add debug prints to trace execution
# Monitor API requests and responses
# Check intermediate results at each step