Documentation
Good documentation is essential for maintainable code. This includes comments, docstrings, and following established documentation standards.
Google Python Style Guide for Documentation
Following Google's Python Style Guide for Comments and Docstrings ensures consistent and comprehensive documentation.
Docstring Format
Use the Google docstring format for consistency:
def fetch_smalltable_rows(table_handle: bigtable.Table,
keys: Sequence[bytes],
require_all_keys: bool = False,
) -> Mapping[bytes, tuple[str, ...]]:
"""Fetches rows from a Smalltable.
Retrieves rows pertaining to the given keys from the Table instance
represented by table_handle. String keys will be UTF-8 encoded.
Args:
table_handle: An open bigtable.Table instance.
keys: A sequence of strings representing the key of each table
row to fetch. String keys will be UTF-8 encoded.
require_all_keys: If True only rows with values set for all keys will be
returned.
Returns:
A dict mapping keys to the corresponding table row data
fetched. Each row is represented as a tuple of strings. For
example:
{b'Serak': ('Rigel VII', 'Preparer'),
b'Zim': ('Irk', 'Invader'),
b'Lrrr': ('Omicron Persei 8', 'Emperor')}
Returned keys are always bytes. If a key from the keys argument is
missing from the dictionary, then that row was not found in the
table (and require_all_keys must have been False).
Raises:
IOError: An error occurred accessing the smalltable.
"""
Docstring Conventions (PEP 257)
PEP 257 defines docstring conventions:
Basic Rules
- Use triple double quotes for docstrings
- Start with a one-line summary
- Use imperative mood ("Return" not "Returns")
- End the summary line with a period
- Leave a blank line after the summary if there's more content
Module Docstrings
"""A one-line summary of the module or program.
This module provides utilities for data processing and analysis.
It includes functions for cleaning, transforming, and validating
research data from various sources.
Example:
Basic usage of this module:
from data_utils import clean_data, validate_input
cleaned = clean_data(raw_data)
if validate_input(cleaned):
process_data(cleaned)
Attributes:
DEFAULT_ENCODING (str): The default character encoding.
MAX_RETRIES (int): Maximum number of retry attempts.
"""
DEFAULT_ENCODING = "utf-8"
MAX_RETRIES = 3
Class Docstrings
class DataProcessor:
"""A class for processing research data.
This class provides methods for cleaning, transforming, and validating
data from various research sources. It maintains state about the
processing pipeline and can handle multiple data formats.
Attributes:
config: A dictionary containing processing configuration.
processed_count: The number of records processed.
Example:
processor = DataProcessor(config={'format': 'csv'})
result = processor.process_file('data.csv')
"""
def __init__(self, config: dict[str, str]) -> None:
"""Initialize the DataProcessor.
Args:
config: Configuration dictionary with processing options.
"""
self.config = config
self.processed_count = 0
Function Docstrings
def calculate_statistics(data: list[float],
include_median: bool = True) -> dict[str, float]:
"""Calculate basic statistics for a dataset.
Computes mean, standard deviation, and optionally median
for the provided numerical data.
Args:
data: A list of numerical values to analyze.
include_median: Whether to include median in the results.
Returns:
A dictionary containing statistical measures:
- 'mean': The arithmetic mean
- 'std': The standard deviation
- 'median': The median (if include_median is True)
Raises:
ValueError: If the data list is empty.
TypeError: If data contains non-numerical values.
Example:
>>> data = [1, 2, 3, 4, 5]
>>> stats = calculate_statistics(data)
>>> print(stats['mean'])
3.0
"""
if not data:
raise ValueError("Data list cannot be empty")
# Implementation here
pass
Comment Guidelines
When to Comment
Good reasons to comment: - Explain complex algorithms or business logic - Clarify non-obvious code behavior - Provide context for decisions - Explain workarounds or temporary solutions - Document external dependencies or assumptions
# Use binary search for O(log n) performance on sorted data
def find_insertion_point(sorted_list: list[int], value: int) -> int:
"""Find the index where value should be inserted to maintain sort order."""
left, right = 0, len(sorted_list)
while left < right:
mid = (left + right) // 2
if sorted_list[mid] < value:
left = mid + 1
else:
right = mid
return left
# Workaround for API rate limiting - retry with exponential backoff
def api_request_with_retry(url: str, max_retries: int = 3) -> dict:
"""Make API request with retry logic for rate limiting."""
for attempt in range(max_retries):
try:
response = requests.get(url)
return response.json()
except requests.exceptions.HTTPError as e:
if e.response.status_code == 429: # Rate limited
wait_time = 2 ** attempt # Exponential backoff
time.sleep(wait_time)
else:
raise
raise Exception(f"Failed after {max_retries} attempts")
When NOT to Comment
Avoid these types of comments: - Stating the obvious - Repeating what the code clearly shows - Outdated or incorrect information
# Bad - states the obvious
x = x + 1 # Increment x by 1
user_count = len(users) # Get the length of users list
# Good - explains why
x = x + 1 # Compensate for zero-based indexing
user_count = len(users) # Cache count to avoid repeated calculations
Documentation Tools
Sphinx
Generate documentation from docstrings:
pip install sphinx
sphinx-quickstart
sphinx-build -b html source build
conda install -c conda-forge sphinx
# or
mamba install -c conda-forge sphinx
sphinx-quickstart
sphinx-build -b html source build
uv add sphinx
sphinx-quickstart
sphinx-build -b html source build
poetry add sphinx --group dev
sphinx-quickstart
sphinx-build -b html source build
MkDocs
Create documentation websites:
pip install mkdocs
mkdocs new my-project
mkdocs serve
conda install -c conda-forge mkdocs
# or
mamba install -c conda-forge mkdocs
mkdocs new my-project
mkdocs serve
uv add mkdocs
mkdocs new my-project
mkdocs serve
poetry add mkdocs --group dev
mkdocs new my-project
mkdocs serve
Pydoc
Built-in Python documentation generator:
python -m pydoc -w mymodule
python -m pydoc -p 8080 # Start web server
Type Hints as Documentation
Type hints serve as inline documentation:
from typing import Optional, Union
from pathlib import Path
def load_config(
config_path: Path,
default_values: Optional[dict[str, str]] = None,
format_type: Union[str, None] = None
) -> dict[str, str]:
"""Load configuration from file.
The type hints clearly show:
- config_path must be a Path object
- default_values is optional and should be a string dict
- format_type can be a string or None
- Returns a string dictionary
"""
pass
Documentation Best Practices
1. Keep Documentation Current
def process_data(data: list[dict]) -> list[dict]:
"""Process research data records.
Note: Update this docstring when changing the algorithm!
Args:
data: List of data records as dictionaries.
Returns:
Processed data records.
"""
# When you modify this function, update the docstring too
pass
2. Use Examples
def format_citation(authors: list[str], title: str, year: int) -> str:
"""Format a research paper citation.
Args:
authors: List of author names.
title: Paper title.
year: Publication year.
Returns:
Formatted citation string.
Example:
>>> authors = ["Smith, J.", "Doe, A."]
>>> citation = format_citation(authors, "Research Methods", 2023)
>>> print(citation)
Smith, J., & Doe, A. (2023). Research Methods.
"""
pass
3. Document Edge Cases
def calculate_percentage(part: float, whole: float) -> float:
"""Calculate percentage.
Args:
part: The part value.
whole: The whole value.
Returns:
Percentage as a float (0-100).
Raises:
ZeroDivisionError: If whole is zero.
ValueError: If part or whole is negative.
Note:
Returns 0.0 if both part and whole are zero.
"""
if whole == 0 and part == 0:
return 0.0
if whole == 0:
raise ZeroDivisionError("Cannot divide by zero")
if part < 0 or whole < 0:
raise ValueError("Values must be non-negative")
return (part / whole) * 100
4. Use Consistent Style
Choose one docstring style and stick to it throughout your project:
- Google Style: Clear sections (Args, Returns, Raises)
- NumPy Style: Similar to Google but different formatting
- Sphinx Style: Uses reStructuredText markup
5. Document Public APIs
Focus documentation efforts on public interfaces:
class DataAnalyzer:
"""Public class for data analysis."""
def analyze(self, data: list) -> dict:
"""Public method - needs full documentation."""
return self._internal_process(data)
def _internal_process(self, data: list) -> dict:
"""Private method - minimal documentation is fine."""
# Implementation details
pass
README Files
Every project should have a comprehensive README:
# Project Name
Brief description of what this project does.
## Installation
```bash
pip install -r requirements.txt
Usage
from myproject import MyClass
analyzer = MyClass()
result = analyzer.process(data)
API Reference
Link to detailed API documentation.
Contributing
Guidelines for contributors.
License
License information. ```
Good documentation makes your code accessible to others (including your future self). Invest time in writing clear, helpful documentation that explains not just what your code does, but why it does it.