List GCS Bucket Tool
Overview
The List GCS Bucket tool provides file browsing capabilities for Google Cloud Storage buckets, enabling users to discover and navigate available files before processing them with other tools like extract_files.
Features
- File Discovery: Browse and list files in GCS buckets
- Folder Navigation: Navigate through folder hierarchies
- File Filtering: Filter by file extensions and filename patterns
- Metadata Retrieval: Get file size, content type, and modification times
- Pagination Support: Handle large buckets with continuation tokens
- URI Generation: Returns gs:// URIs ready for use with other tools
Usage
Basic Usage
List files in the default bucket:
result = await list_gcs_bucket(
bucket_path="aitana-documents-bucket"
)
# Returns: List of gs:// URIs
With Filtering
Filter by file type and pattern:
result = await list_gcs_bucket(
bucket_path="aitana-documents-bucket",
prefix="Competitors/",
file_extensions=[".pdf", ".docx"],
filename_pattern="2024",
max_files=50
)
Folder Navigation
List folders instead of files:
result = await list_gcs_bucket(
bucket_path="aitana-documents-bucket",
list_folders=True,
prefix="cases/"
)
# Returns: List of folder prefixes
With Metadata
Include file metadata:
result = await list_gcs_bucket(
bucket_path="aitana-documents-bucket",
include_metadata=True,
max_files=10
)
# Returns: Files with size, content_type, and updated time
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
bucket_path |
str | “aitana-documents-bucket” | GCS bucket name or gs:// path |
prefix |
str | None | Folder prefix to search within |
max_files |
int | 20 | Maximum files to return |
file_extensions |
List[str] | None | Filter by extensions (e.g., [“.pdf”]) |
include_metadata |
bool | False | Include file metadata |
list_folders |
bool | False | List folders instead of files |
filename_pattern |
str | None | Pattern to match in filenames |
continuation_token |
str | None | Token for pagination |
Response Format
Basic Response
{
"file_uris": [
"gs://aitana-documents-bucket/cases/case1.pdf",
"gs://aitana-documents-bucket/cases/case2.docx"
],
"count": 2,
"has_more": false
}
With Metadata
{
"file_uris": [
"gs://aitana-documents-bucket/cases/case1.pdf"
],
"files": [
{
"uri": "gs://aitana-documents-bucket/cases/case1.pdf",
"name": "cases/case1.pdf",
"size": 1048576,
"content_type": "application/pdf",
"updated": "2024-01-15T10:30:00Z"
}
],
"count": 1,
"has_more": false
}
Folder Listing
{
"folders": [
"cases/energy/",
"cases/finance/",
"cases/healthcare/"
],
"count": 3
}
With Pagination
{
"file_uris": [...],
"count": 100,
"has_more": true,
"continuation_token": "abc123...",
"message": "Retrieved 100 files. Use continuation_token for next page."
}
Integration with Other Tools
With Extract Files Tool
- First, list available files:
files = await list_gcs_bucket( bucket_path="aitana-documents-bucket", prefix="Competitors/", file_extensions=[".pdf"] ) - Then extract content from selected files:
result = await extract_files( question="What are the key competitive advantages?", file_uris=files["file_uris"][:10] # Process first 10 files )
With AI Search
Combine with AI search for comprehensive analysis:
# 1. Search for relevant documents
search_results = await ai_search(
question="renewable energy projects",
datastore_id="aitana3"
)
# 2. List related files
files = await list_gcs_bucket(
prefix="energy/renewable/",
filename_pattern="project"
)
# 3. Extract from both sources
combined_analysis = await extract_files(
question="Summarize all renewable energy projects",
file_uris=files["file_uris"]
)
CLI Usage
Via Aitana CLI
# List files in default bucket
aitana tool list-gcs-bucket
# With specific bucket and filters
aitana tool list-gcs-bucket \
--bucket "gs://my-bucket/data/" \
--extensions ".pdf,.docx" \
--max-files 50
# List folders for navigation
aitana tool list-gcs-bucket \
--bucket "aitana-documents-bucket" \
--list-folders \
--prefix "cases/"
Via Direct API
curl -X POST http://localhost:1956/direct/tools/list-gcs-bucket \
-H "Content-Type: application/json" \
-d '{
"bucket_path": "aitana-documents-bucket",
"prefix": "Competitors/",
"file_extensions": [".pdf"],
"max_files": 20,
"include_metadata": true
}'
MCP Integration
The tool is available through MCP for Claude Desktop and Claude Code:
{
"tool": "list-gcs-bucket",
"parameters": {
"bucket_path": "aitana-documents-bucket",
"max_files": 100,
"file_extensions": [".pdf", ".txt"],
"include_metadata": false
}
}
Performance Considerations
Optimal Settings
- Default max_files: 20 for fast response
- With metadata: Increases response time, use sparingly
- Large buckets: Use pagination with continuation_token
- Filtering: Apply filters to reduce result set
Caching
Results are not cached by default. For frequently accessed buckets:
- Consider implementing client-side caching
- Use prefixes to limit scope
- Cache folder structure separately
Security
Permissions
- Requires
storage.objects.listpermission on the bucket - Respects GCS IAM policies
- User email validation through backend
Access Control
- Bucket access determined by service account
- No direct credential exposure
- Audit logging through Langfuse tracing
Common Use Cases
1. Document Discovery
# Find all contracts from 2024
contracts = await list_gcs_bucket(
prefix="contracts/",
filename_pattern="2024",
file_extensions=[".pdf", ".docx"]
)
2. Competitive Analysis
# Browse competitor documents
competitors = await list_gcs_bucket(
prefix="Competitors/",
list_folders=True
)
# Then drill down into specific competitor
files = await list_gcs_bucket(
prefix="Competitors/CompanyA/",
max_files=50
)
3. Data Pipeline
# Get latest CSV files for processing
data_files = await list_gcs_bucket(
prefix="data/daily/",
file_extensions=[".csv"],
max_files=100,
include_metadata=True
)
# Sort by modification time and process latest
Error Handling
Common Errors
- Bucket Not Found
{ "error": "Bucket 'invalid-bucket' not found", "code": "BUCKET_NOT_FOUND" } - Permission Denied
{ "error": "Permission denied on bucket", "code": "PERMISSION_DENIED" } - Invalid Continuation Token
{ "error": "Invalid or expired continuation token", "code": "INVALID_TOKEN" }
Best Practices
- Start with small max_files for exploration
- Use specific prefixes to narrow scope
- Apply file extension filters when possible
- Avoid include_metadata unless needed
- Implement pagination for large result sets
- Cache folder structure for repeated navigation
- Combine with other tools for comprehensive analysis
Related Documentation
- Extract Files Tool - Process files retrieved by this tool
- AI Search Tool - Complement file browsing with semantic search
- Backend MCP Integration - MCP tool architecture