List GCS Bucket Tool

Overview

The List GCS Bucket tool provides file browsing capabilities for Google Cloud Storage buckets, enabling users to discover and navigate available files before processing them with other tools like extract_files.

Features

  • File Discovery: Browse and list files in GCS buckets
  • Folder Navigation: Navigate through folder hierarchies
  • File Filtering: Filter by file extensions and filename patterns
  • Metadata Retrieval: Get file size, content type, and modification times
  • Pagination Support: Handle large buckets with continuation tokens
  • URI Generation: Returns gs:// URIs ready for use with other tools

Usage

Basic Usage

List files in the default bucket:

result = await list_gcs_bucket(
    bucket_path="aitana-documents-bucket"
)
# Returns: List of gs:// URIs

With Filtering

Filter by file type and pattern:

result = await list_gcs_bucket(
    bucket_path="aitana-documents-bucket",
    prefix="Competitors/",
    file_extensions=[".pdf", ".docx"],
    filename_pattern="2024",
    max_files=50
)

Folder Navigation

List folders instead of files:

result = await list_gcs_bucket(
    bucket_path="aitana-documents-bucket",
    list_folders=True,
    prefix="cases/"
)
# Returns: List of folder prefixes

With Metadata

Include file metadata:

result = await list_gcs_bucket(
    bucket_path="aitana-documents-bucket",
    include_metadata=True,
    max_files=10
)
# Returns: Files with size, content_type, and updated time

Parameters

Parameter Type Default Description
bucket_path str “aitana-documents-bucket” GCS bucket name or gs:// path
prefix str None Folder prefix to search within
max_files int 20 Maximum files to return
file_extensions List[str] None Filter by extensions (e.g., [“.pdf”])
include_metadata bool False Include file metadata
list_folders bool False List folders instead of files
filename_pattern str None Pattern to match in filenames
continuation_token str None Token for pagination

Response Format

Basic Response

{
    "file_uris": [
        "gs://aitana-documents-bucket/cases/case1.pdf",
        "gs://aitana-documents-bucket/cases/case2.docx"
    ],
    "count": 2,
    "has_more": false
}

With Metadata

{
    "file_uris": [
        "gs://aitana-documents-bucket/cases/case1.pdf"
    ],
    "files": [
        {
            "uri": "gs://aitana-documents-bucket/cases/case1.pdf",
            "name": "cases/case1.pdf",
            "size": 1048576,
            "content_type": "application/pdf",
            "updated": "2024-01-15T10:30:00Z"
        }
    ],
    "count": 1,
    "has_more": false
}

Folder Listing

{
    "folders": [
        "cases/energy/",
        "cases/finance/",
        "cases/healthcare/"
    ],
    "count": 3
}

With Pagination

{
    "file_uris": [...],
    "count": 100,
    "has_more": true,
    "continuation_token": "abc123...",
    "message": "Retrieved 100 files. Use continuation_token for next page."
}

Integration with Other Tools

With Extract Files Tool

  1. First, list available files:
    files = await list_gcs_bucket(
     bucket_path="aitana-documents-bucket",
     prefix="Competitors/",
     file_extensions=[".pdf"]
    )
    
  2. Then extract content from selected files:
    result = await extract_files(
     question="What are the key competitive advantages?",
     file_uris=files["file_uris"][:10]  # Process first 10 files
    )
    

Combine with AI search for comprehensive analysis:

# 1. Search for relevant documents
search_results = await ai_search(
    question="renewable energy projects",
    datastore_id="aitana3"
)

# 2. List related files
files = await list_gcs_bucket(
    prefix="energy/renewable/",
    filename_pattern="project"
)

# 3. Extract from both sources
combined_analysis = await extract_files(
    question="Summarize all renewable energy projects",
    file_uris=files["file_uris"]
)

CLI Usage

Via Aitana CLI

# List files in default bucket
aitana tool list-gcs-bucket

# With specific bucket and filters
aitana tool list-gcs-bucket \
  --bucket "gs://my-bucket/data/" \
  --extensions ".pdf,.docx" \
  --max-files 50

# List folders for navigation
aitana tool list-gcs-bucket \
  --bucket "aitana-documents-bucket" \
  --list-folders \
  --prefix "cases/"

Via Direct API

curl -X POST http://localhost:1956/direct/tools/list-gcs-bucket \
  -H "Content-Type: application/json" \
  -d '{
    "bucket_path": "aitana-documents-bucket",
    "prefix": "Competitors/",
    "file_extensions": [".pdf"],
    "max_files": 20,
    "include_metadata": true
  }'

MCP Integration

The tool is available through MCP for Claude Desktop and Claude Code:

{
  "tool": "list-gcs-bucket",
  "parameters": {
    "bucket_path": "aitana-documents-bucket",
    "max_files": 100,
    "file_extensions": [".pdf", ".txt"],
    "include_metadata": false
  }
}

Performance Considerations

Optimal Settings

  • Default max_files: 20 for fast response
  • With metadata: Increases response time, use sparingly
  • Large buckets: Use pagination with continuation_token
  • Filtering: Apply filters to reduce result set

Caching

Results are not cached by default. For frequently accessed buckets:

  • Consider implementing client-side caching
  • Use prefixes to limit scope
  • Cache folder structure separately

Security

Permissions

  • Requires storage.objects.list permission on the bucket
  • Respects GCS IAM policies
  • User email validation through backend

Access Control

  • Bucket access determined by service account
  • No direct credential exposure
  • Audit logging through Langfuse tracing

Common Use Cases

1. Document Discovery

# Find all contracts from 2024
contracts = await list_gcs_bucket(
    prefix="contracts/",
    filename_pattern="2024",
    file_extensions=[".pdf", ".docx"]
)

2. Competitive Analysis

# Browse competitor documents
competitors = await list_gcs_bucket(
    prefix="Competitors/",
    list_folders=True
)
# Then drill down into specific competitor
files = await list_gcs_bucket(
    prefix="Competitors/CompanyA/",
    max_files=50
)

3. Data Pipeline

# Get latest CSV files for processing
data_files = await list_gcs_bucket(
    prefix="data/daily/",
    file_extensions=[".csv"],
    max_files=100,
    include_metadata=True
)
# Sort by modification time and process latest

Error Handling

Common Errors

  1. Bucket Not Found
    {
     "error": "Bucket 'invalid-bucket' not found",
     "code": "BUCKET_NOT_FOUND"
    }
    
  2. Permission Denied
    {
     "error": "Permission denied on bucket",
     "code": "PERMISSION_DENIED"
    }
    
  3. Invalid Continuation Token
    {
     "error": "Invalid or expired continuation token",
     "code": "INVALID_TOKEN"
    }
    

Best Practices

  1. Start with small max_files for exploration
  2. Use specific prefixes to narrow scope
  3. Apply file extension filters when possible
  4. Avoid include_metadata unless needed
  5. Implement pagination for large result sets
  6. Cache folder structure for repeated navigation
  7. Combine with other tools for comprehensive analysis