aboutsummaryrefslogtreecommitdiffhomepage
path: root/src/sec_certs/model
Commit message (Collapse)AuthorAgeFilesLines
* Move away from pip-tools and use uv.J08nY2025-11-171-1/+1
| | | | | | | | | | uv is all the rage. No, but really. The pip-compile approach does not work cross-platform or cross-python version (sometimes it does, but not always). In comparison and uv lockfile is universal and cross-platform: https://docs.astral.sh/uv/concepts/projects/layout/#the-lockfile This should help make our install easier and more robust.
* Add annotations import everywhere.J08nY2025-02-283-0/+6
|
* Fix ReferenceSegmentExtractor.J08nY2025-02-271-2/+14
|
* Improve CC scheme extraction and matching.J08nY2024-11-082-14/+78
| | | | | | | | | | | | This significantly improves the CC scheme extraction by: - Fixing the extraction of several schemes that were mixing certified and archived entries by accident. - Improving the extraction of cert_ids from scheme sites. - Improving the matching heuristic to consider more attributes that are usually present in the site data. Also adds an evaluation notebook to see how this performs.
* Improve scheme parsing.J08nY2024-11-041-0/+23
| | | | | | Only match if category matches. Disregard unwanted warnings. Add progress bars everywhere.
* Fix typing issue.J08nY2024-06-201-1/+1
|
* Move CCDocumentState to cert class.J08nY2024-02-131-4/+4
|
* Refactor document state in CC.J08nY2024-02-131-2/+2
|
* refactoring here and thereadamjanovsky2023-11-243-60/+78
|
* revert hyperparams on segment extractionadamjanovsky2023-11-231-1/+1
|
* hardcode hyperparams for all stagesadamjanovsky2023-11-233-25/+83
|
* hardcode optimal hyperparams for embeddingsadamjanovsky2023-11-231-4/+4
|
* continue refactoring the notebookAdam Janovsky2023-11-142-2/+4
|
* Merge branch 'bump-req-python-to-3-10' into reference-analysisAdam Janovsky2023-11-147-18/+23
|\
| * bump required python to 3.8Adam Janovsky2023-11-147-18/+23
| |
* | fix some ruff errorsAdam Janovsky2023-11-141-1/+1
| |
* | merge fresh mainAdam Janovsky2023-11-141-1/+3
|\|
| * fix new ruff errorsAdam Janovsky2023-11-101-1/+1
| |
| * Fix CC scheme certificate matching.J08nY2023-08-241-0/+2
| |
* | bump referencesadamjanovsky2023-11-149-91/+749
| |
* | recertification -> reevaluation in codeAdam Janovsky2023-10-201-4/+4
| |
* | finalize annotation labelsAdam Janovsky2023-10-191-1/+1
| |
* | ditch lang, fix groupbyadamjanovsky2023-09-291-6/+11
| |
* | minor refactoring segment extractorAdam Janovsky2023-09-221-13/+10
| |
* | fix sentence extractionAdam Janovsky2023-09-221-22/+25
| |
* | clean labels when loading dataframesadamjanovsky2023-09-211-0/+2
| |
* | fixes and bump reqsAdam Janovsky2023-09-211-4/+12
| |
* | multiple fixes segment extractoradamjanovsky2023-09-173-64/+133
| |
* | WiP: introduce actually extracted cert_id_keywords instead of canonicalAdam Janovsky2023-09-171-16/+63
| |
* | reference annotater improvements and hyperparam search pocadamjanovsky2023-08-242-11/+70
| |
* | further advances on modeladamjanovsky2023-07-282-62/+117
| |
* | implement tf-idf baseline for reference annotationsadamjanovsky2023-07-271-35/+5
| |
* | add function to turn dataframe from ReferenceSegmentExtractor to LabelStudio ↵Adam Janovsky2023-07-191-0/+16
| | | | | | | | input
* | adjust ReferenceSegmentExtractor to work with OCR-segmented jsonsAdam Janovsky2023-07-191-30/+81
| |
* | merge mainAdam Janovsky2023-06-075-47/+256
|\|
| * Fix Norway cert_id parsing in schemes.J08nY2023-04-251-1/+3
| |
| * Fix black issue.J08nY2023-04-211-1/+0
| |
| * Merge branch 'fix/dup-dedup' into issue/324-Switch-from-NVD-data-feeds-to-APIJ08nY2023-04-215-44/+259
| |\
| | * More comments in matching.J08nY2023-04-182-0/+12
| | |
| | * Add match filtering based on validation date.J08nY2023-04-181-4/+12
| | |
| | * Fix annotations import.J08nY2023-04-181-3/+4
| | |
| | * Make CCSchemeDataset an actual dataset.J08nY2023-04-183-9/+12
| | |
| | * Abstract out matching scores.J08nY2023-04-183-30/+29
| | |
| | * Fix FIPSmatcher iterable.J08nY2023-04-171-1/+2
| | |
| | * Revert "Ditch the __init__ package imports."J08nY2023-04-171-0/+16
| | | | | | | | | | | | This reverts commit 89b3d880088b5c30fa10036f280e73b1c1aee05e.
| | * Change matcher API to not ref dataset.J08nY2023-04-173-18/+15
| | |
| | * Share code between FIPS and CC matching.J08nY2023-04-143-64/+68
| | |
| | * Ditch the __init__ package imports.J08nY2023-04-141-16/+0
| | |
| | * Remove indirect imports from __init__ from our code.J08nY2023-04-141-3/+3
| | |
| | * Add CC scheme matching for datasets.J08nY2023-04-141-17/+55
| | |