notebooks/examples/cc.ipynb


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284

{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Common Criteria example\n",
    "\n",
    "This notebook illustrates basic functionality with the `CCDataset` class that holds Common Criteria dataset and of its sample `CCCertificate`.\n",
    "\n",
    "Note that there exists a front end to this functionality at [sec-certs.org/cc](https://sec-certs.org/cc/). Before reinventing the wheel, it's good idea to check our web. Maybe you don't even need to run the code, but just use our web instead. \n",
    "\n",
    "For full API documentation of the `CCDataset` class go to the [dataset](../../api/dataset) docs."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sec_certs.dataset import CCDataset\n",
    "from sec_certs.sample import CCCertificate\n",
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Get fresh dataset snapshot from mirror\n",
    "\n",
    "There's no need to do full processing of the dataset by yourself, unless you modified `sec-certs` code. You can simply fetch the processed version from the web. \n",
    "\n",
    "Note, however, that you won't be able to access the `pdf` and `txt` files of the certificates. You can only get the data that we extracted from it. \n",
    "\n",
    "Running the whole pipeline can get you the `pdf` and `txt` data. You can see how to do that in the last cell of this notebook."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "dset = CCDataset.from_web()\n",
    "print(len(dset)) # Print number of certificates in the dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Do some basic dataset serialization\n",
    "\n",
    "The dataset can be saved/loaded into/from `json`. Also, the dataset can be converted into a [pandas](https://pandas.pydata.org/) DataFrame. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Dump dataset into json and load it back\n",
    "dset.to_json(\"./cc_dset.json\")\n",
    "new_dset: CCDataset = CCDataset.from_json(\"./cc_dset.json\")\n",
    "assert dset == new_dset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Turn dataset into Pandas DataFrame\n",
    "df = dset.to_pandas()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Simple dataset manipulation\n",
    "\n",
    "The certificates of the dataset are stored in a dictionary that maps certificate's primary key (we call it `dgst`) to the `CCCertificate` object. The primary key of the certificate is simply a hash of the attributes that make the certificate unique.\n",
    "\n",
    "You can iterate over the dataset which is handy when selecting some subset of certificates."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Iterate over certificates in dataset\n",
    "for cert in dset:\n",
    "    pass\n",
    "\n",
    "# Get certificates produced by Infineon manufacturer\n",
    "infineon_certs = [x for x in dset if \"Infineon\" in x.manufacturer]\n",
    "df_infineon = df.loc[df.manufacturer.str.contains(\"Infineon\", case=False)]\n",
    "\n",
    "# Get certificates with some CVE\n",
    "vulnerable_certs = [x for x in dset if x.heuristics.related_cves]\n",
    "df_vulnerable = df.loc[~df.related_cves.isna()]\n",
    "\n",
    "# Show CVE ids of some vulnerable certificate\n",
    "print(f\"{vulnerable_certs[0].heuristics.related_cves=}\")\n",
    "\n",
    "# Get certificates from 2015 and newer\n",
    "df_2015_and_newer = df.loc[df.year_from > 2014]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Plot distribution of years of certification\n",
    "df.year_from.value_counts().sort_index().plot.line()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Dissect a single certificate\n",
    "\n",
    "The `CCCertificate` is basically a data structure that holds all the data we keep about a certificate. Other classes (`CCDataset` or `model` package members) are used to transform and process the certificates. You can see all its attributes at [API docs](https://seccerts.org/docs/api/sample.html)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Select a certificate and print some attributes\n",
    "cert: CCCertificate = dset[\"bad93fb821395db2\"]\n",
    "print(f\"{cert.name=}\")\n",
    "print(f\"{cert.heuristics.cpe_matches=}\")\n",
    "print(f\"{cert.heuristics.report_references.directly_referencing=}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Select all certificates from a dataset for which we detect at least one vulnerability.\n",
    "vulnerable_certs = [x for x in dset if x.heuristics.related_cves]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Serialize single certificate\n",
    "\n",
    "Again, a certificate can be (de)serialized into/from json. It's also possible to construct pandas `Series` from a certificate as shown below"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "cert.to_json(\"./cert.json\")\n",
    "new_cert = cert.from_json(\"./cert.json\")\n",
    "assert cert == new_cert\n",
    "\n",
    "# Serialize as Pandas series\n",
    "ser = pd.Series(cert.pandas_tuple, index=cert.pandas_columns)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Assign dataset with CPE records and compute vulnerabilities\n",
    "\n",
    "*Note*: The data is already computed on dataset obtained with `from_web()`, this is just for illustration. \n",
    "*Note*: This may likely not run in Binder, as the corresponding `CVEDataset` and `CPEDataset` instances take a lot of memory."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Automatically match CPEs and CVEs\n",
    "dset.compute_cpe_heuristics()\n",
    "dset.compute_related_cves()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create new dataset and fully process it\n",
    "\n",
    "The following piece of code roughly corresponds to `$ sec-certs cc all` CLI command -- it fully processes the CC pipeline. This will create a folder in current working directory where the outputs will be stored. \n",
    "\n",
    "```{warning}\n",
    "It's not good idea to run this from notebook. It may take several hours to finish. We recommend using `from_web()` or turning this into a Python script.\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "dset = CCDataset()\n",
    "dset.get_certs_from_web()\n",
    "dset.process_auxiliary_datasets()\n",
    "dset.download_all_artifacts()\n",
    "dset.convert_all_pdfs()\n",
    "dset.analyze_certificates()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Advanced usage\n",
    "There are more notebooks available showcasing more advanced usage of the tool.\n",
    "\n",
    "```{toctree}\n",
    ":caption: Other\n",
    ":hidden: True\n",
    ":maxdepth: 1\n",
    "Temporal trends <../cc/temporal_trends.ipynb>\n",
    "Vulnerabilities <../cc/vulnerabilities.ipynb>\n",
    "References <../cc/references.ipynb>\n",
    "Chain of Trust paper <../cc/chain_of_trust_plots.ipynb>\n",
    "```\n",
    "\n",
    "  - Examine [temporal trends](../cc/temporal_trends.ipynb) in the CC ecosystem.\n",
    "  - Analyze [vulnerabilities](../cc/vulnerabilities.ipynb) of CC certified items.\n",
    "  - Study [references](../cc/references.ipynb) between CC certificates.\n",
    "  - Reproduce the plots from our [Chain of Trust](../cc/chain_of_trust_plots.ipynb) paper."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3.8.13 ('venv': venv)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.13 (default, Jul 27 2022, 12:09:23) \n[Clang 13.1.6 (clang-1316.0.21.2.3)]"
  },
  "orig_nbformat": 4,
  "vscode": {
   "interpreter": {
    "hash": "a5b8c5b127d2cfe5bc3a1c933e197485eb9eba25154c3661362401503b4ef9d4"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}