notebooks/examples/protection_profiles.ipynb


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Protection profiles example\n",
    "\n",
    "This notebook illustrated basic functionality of the `ProtectionProfileDataset` class that holds protection profiles bound to Common Criteria certified products. The object that holds a single profile is called `ProtectionProfile`. \n",
    "\n",
    "Note that there exists a front end to this functionality at [sec-certs.org/pp](https://sec-certs.org/pp/). Before reinventing the wheel, it's good idea to check our web. Maybe you don't even need to run the code, but just use our web instead. \n",
    "\n",
    "For full API documentation of the `ProtectionProfileDataset` class go to the [dataset](../../api/dataset) docs."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sec_certs.dataset import ProtectionProfileDataset\n",
    "from sec_certs.sample import ProtectionProfile"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Get fresh dataset snapshot from mirror\n",
    "\n",
    "There's no need to do full processing of the dataset by yourself, unless you modified `sec-certs` code. You can simply fetch the processed version from the web. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "dset = ProtectionProfileDataset.from_web()\n",
    "print(len(dset)) # Print number of protection profiles in the dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Do some basic dataset serialization\n",
    "\n",
    "The dataset can be saved/loaded into/from `json`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "dset.to_json(\"./pp.json\")\n",
    "new_dset: ProtectionProfileDataset = ProtectionProfileDataset.from_json(\"./pp.json\")\n",
    "assert dset == new_dset"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Simple dataset manipulation\n",
    "\n",
    "The samples of the dataset are stored in a dictionary that maps sample's primary key (we call it `dgst`) to the `ProtectionProfile` object. The primary key of the protection profile is simply a hash of the attributes that make the sample unique.\n",
    "\n",
    "You can iterate over the dataset which is handy when selecting some subset of protection profiles."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "for pp in dset:\n",
    "    pass\n",
    "\n",
    "# Get only collaborative protection profiles\n",
    "collaborative_pps = [x for x in dset if x.web_data.is_collaborative]\n",
    "\n",
    "# Get protection_profiles from 2015 and newer\n",
    "from datetime import date\n",
    "newer_than_2015 = [x for x in dset if x.web_data.not_valid_before > date(2014, 12, 31)]\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Dissect a single protection profiles\n",
    "\n",
    "The `ProtectionProfile` is basically a data structure that holds all the data we keep about a protection profile. Other classes (`ProtectionProfile` or `model` package members) are used to transform and process the samples. You can see all its attributes at [API docs](https://seccerts.org/docs/api/sample.html)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Select a protection profile and print some attributes\n",
    "pp: ProtectionProfile = dset[\"b02ed76d2545326a\"]\n",
    "print(f\"{pp.name=}\")\n",
    "print(f\"{pp.web_data.not_valid_before=}\")\n",
    "print(f\"{pp.pdf_data=}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Serialize a single protection profile\n",
    "\n",
    "Again, a protection profile can be (de)serialized into/from json. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pp.to_json(\"./pp.json\")\n",
    "new_pp = pp.from_json(\"./pp.json\")\n",
    "assert pp == new_pp"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create new dataset and fully process it\n",
    "\n",
    "The following piece of code roughly corresponds to `$ sec-certs pp all` CLI command -- it fully processes the PP pipeline. This will create a folder in current working directory where the outputs will be stored. \n",
    "\n",
    "```{warning}\n",
    "It's not good idea to run this from notebook. It may take several hours to finish. We recommend using `from_web()` or turning this into a Python script.\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "dset = ProtectionProfileDataset()\n",
    "dset.get_certs_from_web()\n",
    "dset.process_auxiliary_datasets()\n",
    "dset.download_all_artifacts()\n",
    "dset.convert_all_pdfs()\n",
    "dset.analyze_certificates()"
   ]
  }
 ],
 "metadata": {
  "language_info": {
   "name": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}