aboutsummaryrefslogtreecommitdiffhomepage
path: root/.github/copilot-instructions.md
diff options
context:
space:
mode:
Diffstat (limited to '.github/copilot-instructions.md')
-rw-r--r--.github/copilot-instructions.md82
1 files changed, 31 insertions, 51 deletions
diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md
index 06ea30dd..6039850f 100644
--- a/.github/copilot-instructions.md
+++ b/.github/copilot-instructions.md
@@ -7,7 +7,7 @@
### Tech Stack
- **Language**: Python 3.10+ (tested on 3.10, 3.11, 3.12)
- **Size**: ~75 Python source files (~13.5k LOC), ~36 test files
-- **Package Management**: pip-tools with pinned requirements in `requirements/*.txt`
+- **Package Management**: uv with pinned requirements in `uv.lock`
- **Key Dependencies**: BeautifulSoup4, pandas, spacy, pdftotext (requires Poppler), pikepdf, pytesseract, scikit-learn, matplotlib, networkx, pydantic
- **Build System**: setuptools with setuptools-scm for versioning
- **Testing**: pytest with custom markers (`slow`, `remote`)
@@ -49,27 +49,21 @@ sudo apt-get install -y \
**The version file `src/sec_certs/_version.py` is auto-generated by setuptools-scm and must NOT be committed.**
If missing during development, create a temporary version: `echo '__version__ = "dev"' > src/sec_certs/_version.py`
-**Standard install (for testing and development):**
+**Development install (for testing and development):**
```bash
-# Install test dependencies (includes pytest, coverage, etc.)
-pip install -r requirements/test_requirements.txt
+# Create a virtual environment
+uv venv
-# Install sec-certs in editable mode
-pip install -e .
+# Install all dependencies (including dev ones) and the project in editable mode
+uv sync --dev
# ALWAYS download the spacy language model after install
-python -m spacy download en_core_web_sm
-```
+uv run spacy download en_core_web_sm
-**For full development (linting, docs):**
-```bash
-pip install -r requirements/dev_requirements.txt
-pip install -e .
-python -m spacy download en_core_web_sm
+# Optionally, you can activate the virtual environment and avoid all the "uv run" prefixes
+source .venv/bin/activate
```
-**Note on pip-sync**: Do NOT use `pip-sync requirements/all_requirements.txt` in environments with system packages (like GitHub Actions runners). It tries to uninstall system packages and will fail. Use `pip install -r` instead.
-
Verify the installation (sec-certs and spacy language model) by importing the package:
```python
import sec_certs._version
@@ -85,12 +79,12 @@ print(spacy.load("en_core_web_sm"))
**Basic test run (excludes remote/flaky tests):**
```bash
-PYTHONPATH=src:$PYTHONPATH pytest tests -m "not remote" -v
+uv run pytest tests -m "not remote" -v
```
**Test with coverage (as in CI):**
```bash
-pytest --cov=sec_certs -m "not remote" --junitxml=junit.xml tests
+uv run pytest --cov=sec_certs -m "not remote" --junitxml=junit.xml tests
```
**Test markers:**
@@ -106,27 +100,26 @@ pytest --cov=sec_certs -m "not remote" --junitxml=junit.xml tests
**Using pre-commit (recommended):**
```bash
-pip install -r requirements/dev_requirements.txt
-pre-commit install
-pre-commit run --all-files
+uv run pre-commit install
+uv run pre-commit run --all-files
```
**Manual linting:**
```bash
# Ruff linting (checks code style, imports, complexity)
-ruff check .
+uv run ruff check .
# Ruff with auto-fix
-ruff check . --fix
+uv run ruff check . --fix
# Ruff formatting check
-ruff format --check .
+uv run ruff format --check .
# Ruff auto-format
-ruff format .
+uv run ruff format .
# MyPy type checking
-mypy .
+uv run mypy .
```
**Linting configuration**: See `pyproject.toml` for Ruff and MyPy settings. Target Python 3.10. Line length: 120. Notebooks (*.ipynb) are excluded from linting.
@@ -135,7 +128,7 @@ mypy .
```bash
cd docs
-make html
+uv run make html
```
Output goes to `docs/_build/html/`. Documentation uses Sphinx with myst-nb for Markdown and Jupyter notebooks.
@@ -143,8 +136,7 @@ Output goes to `docs/_build/html/`. Documentation uses Sphinx with myst-nb for M
### Building for Distribution
```bash
-python -m pip install build
-python -m build
+uv build
```
This creates source and wheel distributions in `dist/`.
@@ -174,22 +166,15 @@ sec-certs/
│ └── conftest.py # Pytest configuration and fixtures
├── docs/ # Sphinx documentation source
├── notebooks/ # Jupyter notebooks (examples, analysis)
-├── requirements/ # Pinned requirements files
-│ ├── requirements.txt # Core dependencies
-│ ├── dev_requirements.txt # Dev tools (ruff, mypy, sphinx)
-│ ├── test_requirements.txt # Test dependencies
-│ ├── nlp_requirements.txt # Optional NLP dependencies
-│ ├── all_requirements.txt # All of the above combined
-│ └── compile.sh # Script to regenerate requirements
├── pyproject.toml # Package metadata, build config, tool settings
├── .pre-commit-config.yaml # Pre-commit hooks configuration
-└── Dockerfile # Docker image for reproducible environment
+├── Dockerfile # Docker image for reproducible environment
+└── uv.lock # uv lockfile with pinned dependendices.
```
### Key Files and Configurations
- **pyproject.toml**: Package definition, dependencies, Ruff/MyPy/pytest config. Single source of truth for dependencies (unpinned).
-- **requirements/*.txt**: Pinned versions generated by `compile.sh`. CI uses these for reproducible builds.
- **src/sec_certs/rules.yaml**: Regular expressions for extracting data from certificates. Add patterns here.
- **src/sec_certs/configuration.py**: Runtime configuration using pydantic-settings. Reads from env vars with `SECCERTS_` prefix.
- **.pre-commit-config.yaml**: Defines pre-commit hooks (ruff, mypy). Versions should match pyproject.toml.
@@ -247,8 +232,8 @@ sec-certs/
1. Create branch from `main` (only stable branch for PRs)
2. Make minimal code changes
3. Add tests in appropriate `tests/` subdirectory
-4. Run linters: `pre-commit run --all-files` or `ruff check . && mypy .`
-5. Run tests: `pytest tests -m "not remote" -v`
+4. Run linters: `uv run pre-commit run --all-files` or `uv run ruff check . && uv run mypy .`
+5. Run tests: `uv run pytest tests -m "not remote" -v`
6. Update docs if public API changed
7. Commit and push (CI will validate)
@@ -257,8 +242,7 @@ sec-certs/
```bash
# Edit pyproject.toml to add/update dependency
# Regenerate pinned requirements
-cd requirements
-./compile.sh
+uv lock
# Commit both pyproject.toml and requirements/*.txt changes
```
@@ -272,7 +256,7 @@ dset = CCDataset.from_web() # Downloads from sec-certs.org
**Processing from scratch (requires full setup, takes hours, DO NOT DO THIS):**
```bash
-sec-certs cc all -o ./dataset
+uv run sec-certs cc all -o ./dataset
```
## Common Pitfalls and Gotchas
@@ -285,17 +269,13 @@ sec-certs cc all -o ./dataset
4. **Java in PATH**: Required for FIPS table parsing. Verify with `java -version`.
-5. **pip-sync on GitHub Actions**: Don't use it with system packages. Use `pip install -r requirements/*.txt` instead.
-
-6. **Test markers**: Exclude flaky remote tests with `-m "not remote"` for stable local testing.
-
-7. **Import from src**: When running without install, set `PYTHONPATH=src:$PYTHONPATH` to import sec_certs modules.
+5. **Test markers**: Exclude flaky remote tests with `-m "not remote"` for stable local testing.
-8. **Default dataset location**: CLI creates `./dataset` by default. Add to .gitignore if working locally.
+6. **Default dataset location**: CLI creates `./dataset` by default. Add to .gitignore if working locally.
-9. **Pre-commit hook behavior**: Pre-commit hooks warn about issues but don't auto-fix. Run `ruff check . --fix` to apply fixes.
+7. **Pre-commit hook behavior**: Pre-commit hooks warn about issues but don't auto-fix. Run `ruff check . --fix` to apply fixes.
-10. **Long-running commands**: Full dataset processing (`sec-certs cc all`) takes hours. Use pre-processed datasets from web for analysis.
+8. **Long-running commands**: Full dataset processing (`sec-certs cc all`) takes hours. Use pre-processed datasets from web for analysis.
## Additional Resources
@@ -313,7 +293,7 @@ sec-certs cc all -o ./dataset
These instructions have been validated by examining repository structure, workflows, documentation, and testing commands. When working on this repository:
1. **Trust these build/test commands** - they are verified to work
-2. **Follow the setup order** (system deps → pip deps → spacy model → install)
+2. **Follow the setup order** (system deps → python deps and install (uv sync) → spacy model)
3. **Only search/explore if** these instructions are incomplete or incorrect
4. **Refer to these instructions first** before trying alternative approaches