List GCS Bucket Tool

Overview

The List GCS Bucket tool provides file browsing capabilities for Google Cloud Storage buckets, enabling users to discover and navigate available files before processing them with other tools like extract_files.

Features

File Discovery: Browse and list files in GCS buckets
Folder Navigation: Navigate through folder hierarchies
File Filtering: Filter by file extensions and filename patterns
Metadata Retrieval: Get file size, content type, and modification times
Pagination Support: Handle large buckets with continuation tokens
URI Generation: Returns gs:// URIs ready for use with other tools

Usage

Basic Usage

List files in the default bucket:

result = await list_gcs_bucket(
    bucket_path="aitana-documents-bucket"
)
# Returns: List of gs:// URIs

With Filtering

Filter by file type and pattern:

result = await list_gcs_bucket(
    bucket_path="aitana-documents-bucket",
    prefix="Competitors/",
    file_extensions=[".pdf", ".docx"],
    filename_pattern="2024",
    max_files=50
)

List folders instead of files:

result = await list_gcs_bucket(
    bucket_path="aitana-documents-bucket",
    list_folders=True,
    prefix="cases/"
)
# Returns: List of folder prefixes

With Metadata

Include file metadata:

result = await list_gcs_bucket(
    bucket_path="aitana-documents-bucket",
    include_metadata=True,
    max_files=10
)
# Returns: Files with size, content_type, and updated time

Parameters

Parameter	Type	Default	Description
`bucket_path`	str	“aitana-documents-bucket”	GCS bucket name or gs:// path
`prefix`	str	None	Folder prefix to search within
`max_files`	int	20	Maximum files to return
`file_extensions`	List[str]	None	Filter by extensions (e.g., [“.pdf”])
`include_metadata`	bool	False	Include file metadata
`list_folders`	bool	False	List folders instead of files
`filename_pattern`	str	None	Pattern to match in filenames
`continuation_token`	str	None	Token for pagination

Response Format

Basic Response

{
    "file_uris": [
        "gs://aitana-documents-bucket/cases/case1.pdf",
        "gs://aitana-documents-bucket/cases/case2.docx"
    ],
    "count": 2,
    "has_more": false
}

With Metadata

{
    "file_uris": [
        "gs://aitana-documents-bucket/cases/case1.pdf"
    ],
    "files": [
        {
            "uri": "gs://aitana-documents-bucket/cases/case1.pdf",
            "name": "cases/case1.pdf",
            "size": 1048576,
            "content_type": "application/pdf",
            "updated": "2024-01-15T10:30:00Z"
        }
    ],
    "count": 1,
    "has_more": false
}

Folder Listing

{
    "folders": [
        "cases/energy/",
        "cases/finance/",
        "cases/healthcare/"
    ],
    "count": 3
}

With Pagination

{
    "file_uris": [...],
    "count": 100,
    "has_more": true,
    "continuation_token": "abc123...",
    "message": "Retrieved 100 files. Use continuation_token for next page."
}

Integration with Other Tools

With Extract Files Tool

First, list available files:

files = await list_gcs_bucket(
 bucket_path="aitana-documents-bucket",
 prefix="Competitors/",
 file_extensions=[".pdf"]
)

Then extract content from selected files:

result = await extract_files(
 question="What are the key competitive advantages?",
 file_uris=files["file_uris"][:10]  # Process first 10 files
)

With AI Search

Combine with AI search for comprehensive analysis:

# 1. Search for relevant documents
search_results = await ai_search(
    question="renewable energy projects",
    datastore_id="aitana3"
)

# 2. List related files
files = await list_gcs_bucket(
    prefix="energy/renewable/",
    filename_pattern="project"
)

# 3. Extract from both sources
combined_analysis = await extract_files(
    question="Summarize all renewable energy projects",
    file_uris=files["file_uris"]
)

CLI Usage

Via Aitana CLI

# List files in default bucket
aitana tool list-gcs-bucket

# With specific bucket and filters
aitana tool list-gcs-bucket \
  --bucket "gs://my-bucket/data/" \
  --extensions ".pdf,.docx" \
  --max-files 50

# List folders for navigation
aitana tool list-gcs-bucket \
  --bucket "aitana-documents-bucket" \
  --list-folders \
  --prefix "cases/"

Via Direct API

curl -X POST http://localhost:1956/direct/tools/list-gcs-bucket \
  -H "Content-Type: application/json" \
  -d '{
    "bucket_path": "aitana-documents-bucket",
    "prefix": "Competitors/",
    "file_extensions": [".pdf"],
    "max_files": 20,
    "include_metadata": true
  }'

MCP Integration

The tool is available through MCP for Claude Desktop and Claude Code:

{
  "tool": "list-gcs-bucket",
  "parameters": {
    "bucket_path": "aitana-documents-bucket",
    "max_files": 100,
    "file_extensions": [".pdf", ".txt"],
    "include_metadata": false
  }
}

Performance Considerations

Optimal Settings

Default max_files: 20 for fast response
With metadata: Increases response time, use sparingly
Large buckets: Use pagination with continuation_token
Filtering: Apply filters to reduce result set

Caching

Results are not cached by default. For frequently accessed buckets:

Consider implementing client-side caching
Use prefixes to limit scope
Cache folder structure separately

Security

Permissions

Requires storage.objects.list permission on the bucket
Respects GCS IAM policies
User email validation through backend

Access Control

Bucket access determined by service account
No direct credential exposure
Audit logging through Langfuse tracing

Common Use Cases

1. Document Discovery

# Find all contracts from 2024
contracts = await list_gcs_bucket(
    prefix="contracts/",
    filename_pattern="2024",
    file_extensions=[".pdf", ".docx"]
)

2. Competitive Analysis

# Browse competitor documents
competitors = await list_gcs_bucket(
    prefix="Competitors/",
    list_folders=True
)
# Then drill down into specific competitor
files = await list_gcs_bucket(
    prefix="Competitors/CompanyA/",
    max_files=50
)

3. Data Pipeline

# Get latest CSV files for processing
data_files = await list_gcs_bucket(
    prefix="data/daily/",
    file_extensions=[".csv"],
    max_files=100,
    include_metadata=True
)
# Sort by modification time and process latest

Error Handling

Common Errors

Bucket Not Found

{
 "error": "Bucket 'invalid-bucket' not found",
 "code": "BUCKET_NOT_FOUND"
}

Permission Denied

{
 "error": "Permission denied on bucket",
 "code": "PERMISSION_DENIED"
}

Invalid Continuation Token

{
 "error": "Invalid or expired continuation token",
 "code": "INVALID_TOKEN"
}

Best Practices

Start with small max_files for exploration
Use specific prefixes to narrow scope
Apply file extension filters when possible
Avoid include_metadata unless needed
Implement pagination for large result sets
Cache folder structure for repeated navigation
Combine with other tools for comprehensive analysis

Extract Files Tool - Process files retrieved by this tool
AI Search Tool - Complement file browsing with semantic search
Backend MCP Integration - MCP tool architecture

List GCS Bucket Tool

Overview

Features

Usage

Basic Usage

With Filtering

Folder Navigation

With Metadata

Parameters

Response Format

Basic Response

With Metadata

Folder Listing

With Pagination

Integration with Other Tools

With Extract Files Tool

With AI Search

CLI Usage

Via Aitana CLI

Via Direct API

MCP Integration

Performance Considerations

Optimal Settings

Caching

Security

Permissions

Access Control

Common Use Cases

1. Document Discovery

2. Competitive Analysis

3. Data Pipeline

Error Handling

Common Errors

Best Practices

Related Documentation