Project Structure
This document describes the standard project structure for research projects hosted on GitHub Classroom. This structure promotes clean code practices, reproducibility, and maintainability across all student assignments and research work.
Overview
Our standardized project structure follows Python best practices and includes modern tooling for code quality, testing, and documentation. Each project should be self-contained, reproducible, and ready for collaborative development.
Standard Directory Structure
my-research-project/
├── src/ # Source code directory
│ ├── __init__.py # Makes src a Python package
│ ├── problem_1.py # Main problem solution
│ ├── utils.py # Utility functions
│ └── data_processing.py # Data processing modules
├── notebooks/ # Jupyter notebooks for exploration
│ ├── exploration.ipynb # Data exploration
│ ├── analysis.ipynb # Analysis and visualization
│ └── results.ipynb # Final results presentation
├── tests/ # Test files using pytest
│ ├── __init__.py # Makes tests a Python package
│ ├── test_problem_1.py # Tests for problem_1.py
│ ├── test_utils.py # Tests for utility functions
│ └── conftest.py # Pytest configuration and fixtures
├── data/ # Data files (if small and non-sensitive)
│ ├── raw/ # Raw, unprocessed data
│ ├── processed/ # Cleaned and processed data
│ └── external/ # External datasets
├── docs/ # Documentation
│ ├── README.md # Project documentation
│ └── api.md # API documentation
├── .github/ # GitHub-specific files
│ └── workflows/ # GitHub Actions workflows
│ └── tests.yml # Automated testing workflow
├── Justfile # Task runner for common commands
├── pyproject.toml # Python project configuration
├── env.yaml # Conda environment specification
├── .gitignore # Git ignore patterns
├── .pre-commit-config.yaml # Pre-commit hooks configuration
└── README.md # Main project documentation
Core Files and Directories
Source Code (src/)
The src/ directory contains all your Python source code. This structure follows the "src layout" pattern, which provides better isolation and testing practices.
Key files:
- src/problem_1.py - Main implementation for your assignment
- src/utils.py - Reusable utility functions
- src/__init__.py - Package initialization
Benefits of src layout: - Prevents accidental imports of development code - Clearer separation between source and test code - Better support for packaging and distribution
Notebooks (notebooks/)
The notebooks/ directory contains Jupyter notebooks for data exploration, analysis, and result presentation. Notebooks are excellent for:
- Exploratory data analysis
- Prototyping algorithms
- Creating visualizations
- Documenting analysis workflows
Organization tips:
- Use descriptive names with numbers for ordering: 01_data_exploration.ipynb
- Keep notebooks focused on specific tasks
- Export important functions to src/ modules
- Clear outputs before committing to version control
Tests (tests/)
The tests/ directory contains all test files using pytest. Testing is crucial for:
- Verifying code correctness
- Preventing regressions
- Documenting expected behavior
- Building confidence in your implementation
Test file naming:
- test_*.py or *_test.py patterns
- Mirror the structure of your src/ directory
- One test file per source module
See the Testing with Pytest section for detailed information.
Configuration Files
pyproject.toml
Central configuration file for Python projects. Contains:
- Project metadata and dependencies
- Tool configurations (Ruff, pytest, mypy)
- Build system specifications
For detailed configuration options, see our Linting & Formatting and Workflow Integration guides.
env.yaml
Conda environment specification for reproducible environments:
name: my-research-project
channels:
- conda-forge
- defaults
dependencies:
- python=3.10
- numpy
- pandas
- matplotlib
- scipy
- jupyter
- pytest
- pip
- pip:
- ruff
- mypy
For more details, see our Python Environment Management guide.
.gitignore
Specifies files and directories that Git should ignore:
- Python bytecode (
__pycache__/,*.pyc) - Virtual environments (
.venv/,env/) - IDE files (
.vscode/,.idea/) - Data files (if large or sensitive)
- Build artifacts (
build/,dist/)
.pre-commit-config.yaml
Configuration for pre-commit hooks that run automatically before each commit. See Workflow Integration for setup details.
Justfile - Task Automation
The Justfile provides a simple way to run common development tasks. Just is a modern alternative to Make with a cleaner syntax.
Current Justfile Commands
Our current Justfile includes:
# Run pre-commit hooks on all files
pc:
pre-commit run --all-files
# Serve the MkDocs documentation
serve:
poetry run mkdocs serve
Recommended Justfile for Student Projects
For student research projects, we recommend expanding the Justfile with these common tasks:
# Show available commands
default:
@just --list
# Install dependencies and setup development environment
setup:
conda env create -f env.yaml
conda activate my-research-project
pre-commit install
# Run all tests
test:
pytest tests/ -v
# Run tests with coverage report
test-cov:
pytest tests/ --cov=src --cov-report=html --cov-report=term
# Run linting and formatting
lint:
ruff check src/ tests/
ruff format src/ tests/
# Run type checking
typecheck:
mypy src/
# Run all quality checks
check: lint typecheck test
# Clean up generated files
clean:
rm -rf .pytest_cache/
rm -rf htmlcov/
rm -rf .coverage
find . -type d -name __pycache__ -delete
find . -type f -name "*.pyc" -delete
# Start Jupyter notebook server
notebook:
jupyter notebook notebooks/
# Run the main problem solution
run:
python -m src.problem_1
Installing Just
Install Just using one of these methods:
# Using cargo (Rust package manager)
cargo install just
# Using conda
conda install -c conda-forge just
# Using homebrew (macOS)
brew install just
# Using pip (unofficial)
pip install just-install
Testing with Pytest
Pytest is the recommended testing framework for Python projects. It provides powerful features with minimal boilerplate.
Basic Test Structure
# tests/test_problem_1.py
import pytest
from src.problem_1 import solve_problem, validate_input
def test_solve_problem_basic():
"""Test basic functionality of solve_problem."""
result = solve_problem([1, 2, 3])
assert result == 6
def test_solve_problem_empty_input():
"""Test solve_problem with empty input."""
result = solve_problem([])
assert result == 0
def test_validate_input_valid():
"""Test validate_input with valid data."""
assert validate_input([1, 2, 3]) is True
def test_validate_input_invalid():
"""Test validate_input with invalid data."""
with pytest.raises(ValueError):
validate_input("not a list")
@pytest.fixture
def sample_data():
"""Provide sample data for tests."""
return [1, 2, 3, 4, 5]
def test_with_fixture(sample_data):
"""Test using a fixture."""
result = solve_problem(sample_data)
assert result == 15
Pytest Configuration
Configure pytest in pyproject.toml:
[tool.pytest.ini_options]
testpaths = ["tests"]
python_files = ["test_*.py", "*_test.py"]
python_functions = ["test_*"]
addopts = [
"--strict-markers",
"--strict-config",
"--verbose",
"--cov=src",
"--cov-report=term-missing",
"--cov-report=html",
]
markers = [
"slow: marks tests as slow (deselect with '-m \"not slow\"')",
"integration: marks tests as integration tests",
]
Running Tests
# Run all tests
pytest
# Run with coverage
pytest --cov=src
# Run specific test file
pytest tests/test_problem_1.py
# Run tests matching a pattern
pytest -k "test_solve"
# Run tests with verbose output
pytest -v
# Run tests and stop on first failure
pytest -x
Test Organization Best Practices
- Mirror source structure: Test files should mirror your
src/directory structure - Descriptive names: Use clear, descriptive test function names
- One concept per test: Each test should verify one specific behavior
- Use fixtures: Share common test data and setup using pytest fixtures
- Test edge cases: Include tests for boundary conditions and error cases
Development Workflow
Initial Setup
-
Clone the repository:
git clone <repository-url> cd my-research-project -
Set up environment:
# Using conda (recommended) conda env create -f env.yaml conda activate my-research-project # Or using pip + venv python -m venv .venv source .venv/bin/activate # On Windows: .venv\Scripts\activate pip install -r requirements.txt -
Install development tools:
pre-commit install -
Verify setup:
just test # or pytest just lint # or ruff check .
Daily Development
-
Start working:
conda activate my-research-project # or source .venv/bin/activate -
Run tests frequently:
just test # or pytest -
Check code quality:
just check # or just lint && just typecheck -
Before committing:
just check # Ensure all checks pass git add . git commit -m "Descriptive commit message"
Code Quality Tools
Our projects use modern Python tooling for maintaining code quality:
- Ruff: Fast linting and formatting (replaces Black, isort, Flake8)
- MyPy: Static type checking
- Pytest: Testing framework
- Pre-commit: Git hooks for automated checks
For detailed information about these tools, see: - Linting & Formatting - Type Hints - Workflow Integration
GitHub Classroom Integration
Repository Setup
When you accept a GitHub Classroom assignment:
- Repository is automatically created with the standard structure
- GitHub Actions workflows are pre-configured for automated testing
- Branch protection rules may be enabled to require passing tests
Automated Testing
GitHub Actions automatically run tests on every push and pull request:
# .github/workflows/tests.yml
name: Tests
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v4
with:
python-version: '3.10'
- name: Install dependencies
run: |
pip install -r requirements.txt
pip install pytest pytest-cov ruff mypy
- name: Run linting
run: ruff check .
- name: Run type checking
run: mypy src/
- name: Run tests
run: pytest --cov=src
Submission Guidelines
- Complete all required functions in
src/problem_1.py - Write comprehensive tests in
tests/ - Ensure all tests pass locally and in GitHub Actions
- Follow code quality standards (linting, formatting, type hints)
- Update documentation as needed
- Commit frequently with descriptive messages
Best Practices
Code Organization
- Keep functions small and focused - Each function should do one thing well
- Use meaningful names - Variables, functions, and classes should have descriptive names
- Add type hints - Help with code clarity and catch errors early
- Write docstrings - Document your functions and classes
- Separate concerns - Keep data processing, analysis, and visualization separate
Version Control
- Commit frequently - Small, focused commits are easier to review and debug
- Write good commit messages - Explain what and why, not just what
- Use branches - Create feature branches for significant changes
- Don't commit generated files - Keep your repository clean
Testing
- Write tests first - Test-driven development helps design better code
- Test edge cases - Don't just test the happy path
- Keep tests simple - Tests should be easy to understand and maintain
- Use descriptive test names - Test names should explain what is being tested
Documentation
- Keep README up to date - Explain how to set up and run your project
- Document your analysis - Use notebooks to explain your thought process
- Comment complex code - Help future you understand what you were thinking
- Include examples - Show how to use your functions
Troubleshooting
Common Issues
Environment activation fails:
# Make sure conda is initialized
conda init
# Restart your terminal
# Try activating again
conda activate my-research-project
Tests fail with import errors:
# Make sure you're in the project root directory
# Install the project in development mode
pip install -e .
Pre-commit hooks fail:
# Run hooks manually to see detailed errors
pre-commit run --all-files
# Fix the issues and try again
Ruff formatting conflicts:
# Let Ruff fix what it can automatically
ruff check --fix .
ruff format .
Getting Help
- Check the error message - Most errors have helpful information
- Read the documentation - Links to tool documentation are provided throughout
- Ask for help - Use GitHub Issues or class discussion forums
- Search online - Stack Overflow and GitHub Issues are great resources
Related Documentation
- Getting Started Guide - Initial setup and installation
- Python Environment Management - Detailed environment setup
- Clean Code Practices - Code quality guidelines
- Linting & Formatting - Tool configuration and usage
- Type Hints - Static typing in Python
- Workflow Integration - Automation and CI/CD
- Git & GitHub Cheat Sheet - Version control reference
- Conda Cheat Sheet - Environment management reference
This structure ensures that your research projects are reproducible, maintainable, and follow modern Python development best practices. By following these guidelines, you'll develop good habits that will serve you well in both academic and professional software development.