| Commit message (Collapse) | Author | Age | Files | Lines | |
|---|---|---|---|---|---|
| * | Move away from pip-tools and use uv. | J08nY | 2025-11-17 | 1 | -1/+1 |
| | | | | | | | | | | | uv is all the rage. No, but really. The pip-compile approach does not work cross-platform or cross-python version (sometimes it does, but not always). In comparison and uv lockfile is universal and cross-platform: https://docs.astral.sh/uv/concepts/projects/layout/#the-lockfile This should help make our install easier and more robust. | ||||
| * | Add annotations import everywhere. | J08nY | 2025-02-28 | 3 | -0/+6 |
| | | |||||
| * | Fix ReferenceSegmentExtractor. | J08nY | 2025-02-27 | 1 | -2/+14 |
| | | |||||
| * | Improve CC scheme extraction and matching. | J08nY | 2024-11-08 | 2 | -14/+78 |
| | | | | | | | | | | | | | This significantly improves the CC scheme extraction by: - Fixing the extraction of several schemes that were mixing certified and archived entries by accident. - Improving the extraction of cert_ids from scheme sites. - Improving the matching heuristic to consider more attributes that are usually present in the site data. Also adds an evaluation notebook to see how this performs. | ||||
| * | Improve scheme parsing. | J08nY | 2024-11-04 | 1 | -0/+23 |
| | | | | | | | Only match if category matches. Disregard unwanted warnings. Add progress bars everywhere. | ||||
| * | Fix typing issue. | J08nY | 2024-06-20 | 1 | -1/+1 |
| | | |||||
| * | Move CCDocumentState to cert class. | J08nY | 2024-02-13 | 1 | -4/+4 |
| | | |||||
| * | Refactor document state in CC. | J08nY | 2024-02-13 | 1 | -2/+2 |
| | | |||||
| * | refactoring here and there | adamjanovsky | 2023-11-24 | 3 | -60/+78 |
| | | |||||
| * | revert hyperparams on segment extraction | adamjanovsky | 2023-11-23 | 1 | -1/+1 |
| | | |||||
| * | hardcode hyperparams for all stages | adamjanovsky | 2023-11-23 | 3 | -25/+83 |
| | | |||||
| * | hardcode optimal hyperparams for embeddings | adamjanovsky | 2023-11-23 | 1 | -4/+4 |
| | | |||||
| * | continue refactoring the notebook | Adam Janovsky | 2023-11-14 | 2 | -2/+4 |
| | | |||||
| * | Merge branch 'bump-req-python-to-3-10' into reference-analysis | Adam Janovsky | 2023-11-14 | 7 | -18/+23 |
| |\ | |||||
| | * | bump required python to 3.8 | Adam Janovsky | 2023-11-14 | 7 | -18/+23 |
| | | | |||||
| * | | fix some ruff errors | Adam Janovsky | 2023-11-14 | 1 | -1/+1 |
| | | | |||||
| * | | merge fresh main | Adam Janovsky | 2023-11-14 | 1 | -1/+3 |
| |\| | |||||
| | * | fix new ruff errors | Adam Janovsky | 2023-11-10 | 1 | -1/+1 |
| | | | |||||
| | * | Fix CC scheme certificate matching. | J08nY | 2023-08-24 | 1 | -0/+2 |
| | | | |||||
| * | | bump references | adamjanovsky | 2023-11-14 | 9 | -91/+749 |
| | | | |||||
| * | | recertification -> reevaluation in code | Adam Janovsky | 2023-10-20 | 1 | -4/+4 |
| | | | |||||
| * | | finalize annotation labels | Adam Janovsky | 2023-10-19 | 1 | -1/+1 |
| | | | |||||
| * | | ditch lang, fix groupby | adamjanovsky | 2023-09-29 | 1 | -6/+11 |
| | | | |||||
| * | | minor refactoring segment extractor | Adam Janovsky | 2023-09-22 | 1 | -13/+10 |
| | | | |||||
| * | | fix sentence extraction | Adam Janovsky | 2023-09-22 | 1 | -22/+25 |
| | | | |||||
| * | | clean labels when loading dataframes | adamjanovsky | 2023-09-21 | 1 | -0/+2 |
| | | | |||||
| * | | fixes and bump reqs | Adam Janovsky | 2023-09-21 | 1 | -4/+12 |
| | | | |||||
| * | | multiple fixes segment extractor | adamjanovsky | 2023-09-17 | 3 | -64/+133 |
| | | | |||||
| * | | WiP: introduce actually extracted cert_id_keywords instead of canonical | Adam Janovsky | 2023-09-17 | 1 | -16/+63 |
| | | | |||||
| * | | reference annotater improvements and hyperparam search poc | adamjanovsky | 2023-08-24 | 2 | -11/+70 |
| | | | |||||
| * | | further advances on model | adamjanovsky | 2023-07-28 | 2 | -62/+117 |
| | | | |||||
| * | | implement tf-idf baseline for reference annotations | adamjanovsky | 2023-07-27 | 1 | -35/+5 |
| | | | |||||
| * | | add function to turn dataframe from ReferenceSegmentExtractor to LabelStudio ↵ | Adam Janovsky | 2023-07-19 | 1 | -0/+16 |
| | | | | | | | | | input | ||||
| * | | adjust ReferenceSegmentExtractor to work with OCR-segmented jsons | Adam Janovsky | 2023-07-19 | 1 | -30/+81 |
| | | | |||||
| * | | merge main | Adam Janovsky | 2023-06-07 | 5 | -47/+256 |
| |\| | |||||
| | * | Fix Norway cert_id parsing in schemes. | J08nY | 2023-04-25 | 1 | -1/+3 |
| | | | |||||
| | * | Fix black issue. | J08nY | 2023-04-21 | 1 | -1/+0 |
| | | | |||||
| | * | Merge branch 'fix/dup-dedup' into issue/324-Switch-from-NVD-data-feeds-to-API | J08nY | 2023-04-21 | 5 | -44/+259 |
| | |\ | |||||
| | | * | More comments in matching. | J08nY | 2023-04-18 | 2 | -0/+12 |
| | | | | |||||
| | | * | Add match filtering based on validation date. | J08nY | 2023-04-18 | 1 | -4/+12 |
| | | | | |||||
| | | * | Fix annotations import. | J08nY | 2023-04-18 | 1 | -3/+4 |
| | | | | |||||
| | | * | Make CCSchemeDataset an actual dataset. | J08nY | 2023-04-18 | 3 | -9/+12 |
| | | | | |||||
| | | * | Abstract out matching scores. | J08nY | 2023-04-18 | 3 | -30/+29 |
| | | | | |||||
| | | * | Fix FIPSmatcher iterable. | J08nY | 2023-04-17 | 1 | -1/+2 |
| | | | | |||||
| | | * | Revert "Ditch the __init__ package imports." | J08nY | 2023-04-17 | 1 | -0/+16 |
| | | | | | | | | | | | | | This reverts commit 89b3d880088b5c30fa10036f280e73b1c1aee05e. | ||||
| | | * | Change matcher API to not ref dataset. | J08nY | 2023-04-17 | 3 | -18/+15 |
| | | | | |||||
| | | * | Share code between FIPS and CC matching. | J08nY | 2023-04-14 | 3 | -64/+68 |
| | | | | |||||
| | | * | Ditch the __init__ package imports. | J08nY | 2023-04-14 | 1 | -16/+0 |
| | | | | |||||
| | | * | Remove indirect imports from __init__ from our code. | J08nY | 2023-04-14 | 1 | -3/+3 |
| | | | | |||||
| | | * | Add CC scheme matching for datasets. | J08nY | 2023-04-14 | 1 | -17/+55 |
| | | | | |||||
