Stub File Detection Script

Overview

The scripts/detect_stub_files.py script is a Python-based tool that intelligently detects documentation stub files requiring completion. It replaces the original bash-based detection with smart filtering to avoid false positives from code blocks and documentation examples.

Purpose

This script is part of the automated documentation workflow that:

  • Identifies incomplete documentation files marked with stub indicators
  • Prioritizes stub files based on location and content markers
  • Generates detailed analysis for GitHub issue creation
  • Avoids flagging code examples and meta-documentation as stubs

Functions

is_in_code_block(content, marker_pos)

Checks if a position in markdown content is within a code block to avoid false positive detection.

Parameters:

  • content (str): The full content of the file
  • marker_pos (int): Position of the marker to check

Returns:

  • bool: True if the position is within any type of code block

Detects:

  • Fenced code blocks (``` or ~~~)
  • Inline code blocks (code)
  • Indented code blocks (4+ spaces)

Example:

content = "This is a STUB: marker outside code\n```\nSTUB: inside code\n```"
pos = content.find("STUB:")
is_code = is_in_code_block(content, pos)  # False for first, True for second

is_documentation_about_stubs(file_path, content)

Determines if a file documents the stub/workflow system itself to exclude it from stub detection.

Parameters:

  • file_path (str): Path to the file being analyzed
  • content (str): File content

Returns:

  • bool: True if file is meta-documentation about stub systems

Detection criteria:

  • Filename contains: DOCUMENTATION_WORKFLOW, AUTO_DOCUMENTATION, WORKFLOW, STUB_SYSTEM
  • Content mentions multiple workflow indicators

Example:

is_meta = is_documentation_about_stubs(
    "docs/development/AUTO_DOCUMENTATION_WORKFLOWS.md",
    content_with_workflow_docs
)  # Returns True

find_stub_markers_outside_code_blocks(file_path)

Finds stub markers in a file that are not within code blocks or meta-documentation.

Parameters:

  • file_path (str): Path to the markdown file to analyze

Returns:

  • List[str]: List of stub markers found outside code blocks

Detected markers:

  • STUB:, TODO:, PLACEHOLDER, COMING SOON, TBD
  • NOT YET IMPLEMENTED, UNDER CONSTRUCTION, DRAFT, INCOMPLETE

Example:

markers = find_stub_markers_outside_code_blocks("docs/features/NEW_FEATURE.md")
# Returns ["STUB:", "TODO:"] if those markers exist outside code blocks

find_stub_files_by_name(docs_dir=”docs”)

Finds stub files using naming convention patterns.

Parameters:

  • docs_dir (str): Directory to search (defaults to “docs”)

Returns:

  • Set[str]: Set of file paths matching stub naming patterns

Patterns detected:

  • *_STUB.md, STUB_*.md
  • TODO_*.md, *_TODO.md
  • *_DRAFT.md, DRAFT_*.md

Example:

stub_files = find_stub_files_by_name()
# Returns {"docs/features/STUB_NEW_FEATURE.md", "docs/TODO_API_DOCS.md"}

find_stub_files_by_content(docs_dir=”docs”)

Finds stub files by analyzing content for stub markers outside code blocks.

Parameters:

  • docs_dir (str): Directory to search for markdown files

Returns:

  • Dict[str, List[str]]: Dictionary mapping file paths to lists of found markers

Example:

content_stubs = find_stub_files_by_content()
# Returns {"docs/features/incomplete.md": ["STUB:", "TODO:"]}

Priority System

The script assigns priorities based on file location and content:

  • Critical: Files in docs/development/ or containing CRITICAL:/URGENT:/IMPORTANT: markers
  • Moderate: Files in docs/features/, docs/testing/, docs/troubleshooting/
  • Minor: Other documentation files

Usage

Command Line

cd scripts
python3 detect_stub_files.py

Output Format

The script outputs JSON analysis to stdout and progress messages to stderr:

{
  "stub_files": [
    {
      "file": "docs/features/STUB_FEATURE.md",
      "priority": "moderate",
      "type": "feature",
      "effort": "low",
      "size_bytes": 245,
      "line_count": 12,
      "markers": "STUB:, TODO:"
    }
  ],
  "total_stubs": 1,
  "critical_stubs": 0,
  "moderate_stubs": 1,
  "minor_stubs": 0,
  "analysis_timestamp": "2025-06-09T16:23:00Z",
  "summary": {
    "message": "Found 1 stub files requiring completion",
    "breakdown": "Critical: 0, Moderate: 1, Minor: 0",
    "next_action": "Process stubs by priority"
  }
}

Integration with Workflows

This script is used by GitHub Actions workflows for:

  • Automated documentation gap detection
  • GitHub issue creation for incomplete documentation
  • Documentation quality monitoring
  • CI/CD documentation compliance checks

Advanced Features

Smart Code Block Detection

  • Handles multiple code block formats
  • Correctly identifies indented code blocks
  • Processes nested and complex markdown structures

False Positive Prevention

  • Excludes workflow documentation that contains example markers
  • Ignores markers in code examples
  • Filters meta-documentation about the stub system itself

Comprehensive Analysis

  • Estimates completion effort based on file size
  • Categorizes stubs by documentation type
  • Provides actionable prioritization
  • Generates workflow-ready JSON output

Error Handling

The script includes robust error handling:

  • Gracefully handles file read errors
  • Continues processing when individual files fail
  • Provides meaningful error messages
  • Safe encoding handling for various file types

Dependencies

  • Python 3.6+
  • Standard library only (no external dependencies)
  • Pathlib for cross-platform file handling
  • Regular expressions for pattern matching

Quick Start for Developers

This script is part of the automated documentation system. For most developers, you don’t need to run this script directly - it’s automatically executed by GitHub workflows.

When You Need This Information

  • Understanding how stub files are detected
  • Debugging stub file detection issues
  • Contributing to the detection logic
  • Creating custom stub file patterns

Common Developer Workflow

  1. Create stub files with markers like STUB:, TODO:, PLACEHOLDER:
  2. Push to repository - automation detects stubs within 24 hours
  3. Review generated issues - system creates detailed completion tasks
  4. Complete documentation - system automatically closes issues when done

For complete workflow information, see the Documentation Workflows Hub.

Core Documentation System

Implementation Guides