diff options
Diffstat (limited to '.github/copilot-instructions.md')
| -rw-r--r-- | .github/copilot-instructions.md | 82 |
1 files changed, 31 insertions, 51 deletions
diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md index 06ea30dd..6039850f 100644 --- a/.github/copilot-instructions.md +++ b/.github/copilot-instructions.md @@ -7,7 +7,7 @@ ### Tech Stack - **Language**: Python 3.10+ (tested on 3.10, 3.11, 3.12) - **Size**: ~75 Python source files (~13.5k LOC), ~36 test files -- **Package Management**: pip-tools with pinned requirements in `requirements/*.txt` +- **Package Management**: uv with pinned requirements in `uv.lock` - **Key Dependencies**: BeautifulSoup4, pandas, spacy, pdftotext (requires Poppler), pikepdf, pytesseract, scikit-learn, matplotlib, networkx, pydantic - **Build System**: setuptools with setuptools-scm for versioning - **Testing**: pytest with custom markers (`slow`, `remote`) @@ -49,27 +49,21 @@ sudo apt-get install -y \ **The version file `src/sec_certs/_version.py` is auto-generated by setuptools-scm and must NOT be committed.** If missing during development, create a temporary version: `echo '__version__ = "dev"' > src/sec_certs/_version.py` -**Standard install (for testing and development):** +**Development install (for testing and development):** ```bash -# Install test dependencies (includes pytest, coverage, etc.) -pip install -r requirements/test_requirements.txt +# Create a virtual environment +uv venv -# Install sec-certs in editable mode -pip install -e . +# Install all dependencies (including dev ones) and the project in editable mode +uv sync --dev # ALWAYS download the spacy language model after install -python -m spacy download en_core_web_sm -``` +uv run spacy download en_core_web_sm -**For full development (linting, docs):** -```bash -pip install -r requirements/dev_requirements.txt -pip install -e . -python -m spacy download en_core_web_sm +# Optionally, you can activate the virtual environment and avoid all the "uv run" prefixes +source .venv/bin/activate ``` -**Note on pip-sync**: Do NOT use `pip-sync requirements/all_requirements.txt` in environments with system packages (like GitHub Actions runners). It tries to uninstall system packages and will fail. Use `pip install -r` instead. - Verify the installation (sec-certs and spacy language model) by importing the package: ```python import sec_certs._version @@ -85,12 +79,12 @@ print(spacy.load("en_core_web_sm")) **Basic test run (excludes remote/flaky tests):** ```bash -PYTHONPATH=src:$PYTHONPATH pytest tests -m "not remote" -v +uv run pytest tests -m "not remote" -v ``` **Test with coverage (as in CI):** ```bash -pytest --cov=sec_certs -m "not remote" --junitxml=junit.xml tests +uv run pytest --cov=sec_certs -m "not remote" --junitxml=junit.xml tests ``` **Test markers:** @@ -106,27 +100,26 @@ pytest --cov=sec_certs -m "not remote" --junitxml=junit.xml tests **Using pre-commit (recommended):** ```bash -pip install -r requirements/dev_requirements.txt -pre-commit install -pre-commit run --all-files +uv run pre-commit install +uv run pre-commit run --all-files ``` **Manual linting:** ```bash # Ruff linting (checks code style, imports, complexity) -ruff check . +uv run ruff check . # Ruff with auto-fix -ruff check . --fix +uv run ruff check . --fix # Ruff formatting check -ruff format --check . +uv run ruff format --check . # Ruff auto-format -ruff format . +uv run ruff format . # MyPy type checking -mypy . +uv run mypy . ``` **Linting configuration**: See `pyproject.toml` for Ruff and MyPy settings. Target Python 3.10. Line length: 120. Notebooks (*.ipynb) are excluded from linting. @@ -135,7 +128,7 @@ mypy . ```bash cd docs -make html +uv run make html ``` Output goes to `docs/_build/html/`. Documentation uses Sphinx with myst-nb for Markdown and Jupyter notebooks. @@ -143,8 +136,7 @@ Output goes to `docs/_build/html/`. Documentation uses Sphinx with myst-nb for M ### Building for Distribution ```bash -python -m pip install build -python -m build +uv build ``` This creates source and wheel distributions in `dist/`. @@ -174,22 +166,15 @@ sec-certs/ │ └── conftest.py # Pytest configuration and fixtures ├── docs/ # Sphinx documentation source ├── notebooks/ # Jupyter notebooks (examples, analysis) -├── requirements/ # Pinned requirements files -│ ├── requirements.txt # Core dependencies -│ ├── dev_requirements.txt # Dev tools (ruff, mypy, sphinx) -│ ├── test_requirements.txt # Test dependencies -│ ├── nlp_requirements.txt # Optional NLP dependencies -│ ├── all_requirements.txt # All of the above combined -│ └── compile.sh # Script to regenerate requirements ├── pyproject.toml # Package metadata, build config, tool settings ├── .pre-commit-config.yaml # Pre-commit hooks configuration -└── Dockerfile # Docker image for reproducible environment +├── Dockerfile # Docker image for reproducible environment +└── uv.lock # uv lockfile with pinned dependendices. ``` ### Key Files and Configurations - **pyproject.toml**: Package definition, dependencies, Ruff/MyPy/pytest config. Single source of truth for dependencies (unpinned). -- **requirements/*.txt**: Pinned versions generated by `compile.sh`. CI uses these for reproducible builds. - **src/sec_certs/rules.yaml**: Regular expressions for extracting data from certificates. Add patterns here. - **src/sec_certs/configuration.py**: Runtime configuration using pydantic-settings. Reads from env vars with `SECCERTS_` prefix. - **.pre-commit-config.yaml**: Defines pre-commit hooks (ruff, mypy). Versions should match pyproject.toml. @@ -247,8 +232,8 @@ sec-certs/ 1. Create branch from `main` (only stable branch for PRs) 2. Make minimal code changes 3. Add tests in appropriate `tests/` subdirectory -4. Run linters: `pre-commit run --all-files` or `ruff check . && mypy .` -5. Run tests: `pytest tests -m "not remote" -v` +4. Run linters: `uv run pre-commit run --all-files` or `uv run ruff check . && uv run mypy .` +5. Run tests: `uv run pytest tests -m "not remote" -v` 6. Update docs if public API changed 7. Commit and push (CI will validate) @@ -257,8 +242,7 @@ sec-certs/ ```bash # Edit pyproject.toml to add/update dependency # Regenerate pinned requirements -cd requirements -./compile.sh +uv lock # Commit both pyproject.toml and requirements/*.txt changes ``` @@ -272,7 +256,7 @@ dset = CCDataset.from_web() # Downloads from sec-certs.org **Processing from scratch (requires full setup, takes hours, DO NOT DO THIS):** ```bash -sec-certs cc all -o ./dataset +uv run sec-certs cc all -o ./dataset ``` ## Common Pitfalls and Gotchas @@ -285,17 +269,13 @@ sec-certs cc all -o ./dataset 4. **Java in PATH**: Required for FIPS table parsing. Verify with `java -version`. -5. **pip-sync on GitHub Actions**: Don't use it with system packages. Use `pip install -r requirements/*.txt` instead. - -6. **Test markers**: Exclude flaky remote tests with `-m "not remote"` for stable local testing. - -7. **Import from src**: When running without install, set `PYTHONPATH=src:$PYTHONPATH` to import sec_certs modules. +5. **Test markers**: Exclude flaky remote tests with `-m "not remote"` for stable local testing. -8. **Default dataset location**: CLI creates `./dataset` by default. Add to .gitignore if working locally. +6. **Default dataset location**: CLI creates `./dataset` by default. Add to .gitignore if working locally. -9. **Pre-commit hook behavior**: Pre-commit hooks warn about issues but don't auto-fix. Run `ruff check . --fix` to apply fixes. +7. **Pre-commit hook behavior**: Pre-commit hooks warn about issues but don't auto-fix. Run `ruff check . --fix` to apply fixes. -10. **Long-running commands**: Full dataset processing (`sec-certs cc all`) takes hours. Use pre-processed datasets from web for analysis. +8. **Long-running commands**: Full dataset processing (`sec-certs cc all`) takes hours. Use pre-processed datasets from web for analysis. ## Additional Resources @@ -313,7 +293,7 @@ sec-certs cc all -o ./dataset These instructions have been validated by examining repository structure, workflows, documentation, and testing commands. When working on this repository: 1. **Trust these build/test commands** - they are verified to work -2. **Follow the setup order** (system deps → pip deps → spacy model → install) +2. **Follow the setup order** (system deps → python deps and install (uv sync) → spacy model) 3. **Only search/explore if** these instructions are incomplete or incorrect 4. **Refer to these instructions first** before trying alternative approaches |
