winfsp/doc/WinFsp-Performance-Testing/WinFsp-Performance-Testing-Analysis.ipynb
Bill Zissimopoulos 8e45f7d795 doc: WinFsp Performance Testing
Update with new tests and analysis for 2022.
2022-06-06 15:58:19 +01:00

239 lines
8.4 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Performance Testing Analysis\n",
"\n",
"This notebook describes the methodology for analyzing WinFsp performance."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Data Collection\n",
"\n",
"Performance data is collected by running the script `run-all-perf-tests.bat`. This script runs a variety of performance tests against the NTFS, MEMFS and NTPTFS file systems. The tests are run a number of times (default: 3) and the results are saved in CSV files with names `ntfs-N.csv`, `memfs-N.csv` and `ntptfs-N.csv` (where `N` represents the results of test run `N`)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Data Loading\n",
"\n",
"Data is loaded from all CSV files into a single pandas `DataFrame`. The resulting `DataFrame` has columns `test`, `iter`, `ntfs`, `memfs`, `ntptfs`. With multiple test runs there will be multiple time values for a `test`, `iter`, file system triple; in this case the smallest time value is entered into the `DataFrame`. The assumption is that even in a seemingly idle system there is some activity that affects the results; the smallest value is the preferred one to use because it reflects the time when there is less or no other system activity.\n",
"\n",
"The resulting `DataFrame` will contain data similar to the following:\n",
"\n",
"| test | iter | ntfs | memfs | ntptfs |\n",
"|:------------------|------:|-------:|-------:|-------:|\n",
"| file_create_test | 1000 | 0.20 | 0.06 | 0.28 |\n",
"| file_open_test | 1000 | 0.09 | 0.05 | 0.22 |\n",
"| ... | ... | ... | ... | ... |"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import glob, os\n",
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"import pandas as pd\n",
"\n",
"nameord = [\"ntfs\", \"memfs\", \"ntptfs\"]\n",
"\n",
"datamap = {}\n",
"for f in sorted(glob.iglob(\"*.csv\")):\n",
" datamap.setdefault(f.rsplit(\"-\", maxsplit=1)[0], []).append(f)\n",
"\n",
"df = None\n",
"for n in nameord:\n",
" ndf = None\n",
" for f in datamap[n]:\n",
" df0 = pd.read_csv(f, header=None, names=[\"test\", \"iter\", n])\n",
" if ndf is None:\n",
" ndf = df0\n",
" else:\n",
" ndf = ndf.combine(df0, np.minimum)\n",
" if df is None:\n",
" df = ndf\n",
" else:\n",
" df = df.merge(ndf, how=\"left\")\n",
"#df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Data Analysis\n",
"\n",
"For each test a plot is drawn that shows how each file system performs in the particular test. This allows for easy comparisons between file systems for a particular test."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"markermap = { \"ntfs\": \"$\\mathtt{N}$\", \"memfs\": \"$\\mathtt{M}$\", \"ntptfs\": \"$\\mathtt{P}$\"}\n",
"for t, tdf in df.groupby(\"test\", sort=False):\n",
" plt.figure(figsize=(10,8), dpi=100, facecolor=\"white\")\n",
" plt.title(t)\n",
" xlabel = \"iter\"\n",
" if t.startswith(\"file_\"):\n",
" xlabel = \"files\"\n",
" for n in nameord:\n",
" tdf.plot(ax=plt.gca(), x=\"iter\", xlabel=xlabel, y=n, ylabel=\"time\", marker=markermap[n], ms=8)\n",
" plt.legend(nameord)\n",
" plt.savefig(t + \".png\")\n",
" #plt.show()\n",
" plt.close()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![](file_create_test.png)\n",
"![](file_open_test.png)\n",
"![](file_overwrite_test.png)\n",
"![](file_attr_test.png)\n",
"![](file_list_test.png)\n",
"![](file_list_single_test.png)\n",
"![](file_list_none_test.png)\n",
"![](file_delete_test.png)\n",
"![](file_mkdir_test.png)\n",
"![](file_rmdir_test.png)\n",
"\n",
"![](iter.file_open_test.png)\n",
"![](iter.file_attr_test.png)\n",
"![](iter.file_list_single_test.png)\n",
"![](iter.file_list_none_test.png)\n",
"\n",
"![](rdwr_cc_read_large_test.png)\n",
"![](rdwr_cc_read_page_test.png)\n",
"![](rdwr_cc_write_large_test.png)\n",
"![](rdwr_cc_write_page_test.png)\n",
"![](rdwr_nc_read_large_test.png)\n",
"![](rdwr_nc_read_page_test.png)\n",
"![](rdwr_nc_write_large_test.png)\n",
"![](rdwr_nc_write_page_test.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### File tests\n",
"\n",
"File tests are tests that are performed against the hierarchical path namespace of a file system. Such tests include `file_create_test`, `file_open_test`, etc. Measured times for these tests are normalized against the `ntfs` time (so that the `ntfs` time value becomes 1) and a single aggregate plot is produced.\n",
"\n",
"This allows for easy comparison between file systems across all file tests."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"fileord = [\"create\", \"open\", \"iter.open\", \"overwrite\", \"list\", \"list_single\", \"delete\"]\n",
"fdf = pd.concat([df[df.iter == 5000], df[df.iter == 50]])\n",
"fdf.test = fdf.test.map(lambda x: x.replace(\"file_\", \"\").replace(\"_test\", \"\"))\n",
"fdf = fdf.set_index(\"test\").loc[fileord]\n",
"fdf.memfs /= fdf.ntfs; fdf.ntptfs /= fdf.ntfs; fdf.ntfs = 1\n",
"plt.figure(figsize=(10,8), dpi=100, facecolor=\"white\")\n",
"plt.suptitle(\"File Tests\", fontweight=\"light\", fontsize=20, y=0.95)\n",
"plt.title(\"(Shorter bars are better)\")\n",
"fdf.plot.barh(ax=plt.gca(), y=nameord).invert_yaxis()\n",
"plt.gca().set(ylabel=None)\n",
"for container in plt.gca().containers:\n",
" plt.gca().bar_label(container, fmt=\"%0.2f\", padding=4.0, fontsize=\"xx-small\")\n",
"plt.savefig(\"file_tests.png\")\n",
"#plt.show()\n",
"plt.close()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![](file_tests.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Read/write tests\n",
"\n",
"Read/write tests are file I/O tests. Such tests include `rdwr_cc_write_page_test`, `rdwr_cc_read_page_test`, etc. As before measured times for these tests are normalized against the `ntfs` time (so that the `ntfs` time value becomes 1) and a single aggregate plot is produced.\n",
"\n",
"This allows for easy comparison between file systems across all read/write tests."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"rdwrord = [\"cc_read_page\", \"cc_write_page\", \"nc_read_page\", \"nc_write_page\", \"mmap_read\", \"mmap_write\"]\n",
"sdf = df[df.iter == 500].copy()\n",
"sdf.test = sdf.test.map(lambda x: x.replace(\"rdwr_\", \"\").replace(\"_test\", \"\"))\n",
"sdf = sdf.set_index(\"test\").loc[rdwrord]\n",
"sdf.memfs /= sdf.ntfs; sdf.ntptfs /= sdf.ntfs; sdf.ntfs = 1\n",
"plt.figure(figsize=(10,8), dpi=100, facecolor=\"white\")\n",
"plt.suptitle(\"Read/Write Tests\", fontweight=\"light\", fontsize=20, y=0.95)\n",
"plt.title(\"(Shorter bars are better)\")\n",
"sdf.plot.barh(ax=plt.gca(), y=nameord).invert_yaxis()\n",
"plt.gca().set(ylabel=None)\n",
"for container in plt.gca().containers:\n",
" plt.gca().bar_label(container, fmt=\"%0.2f\", padding=4.0, fontsize=\"xx-small\")\n",
"plt.savefig(\"rdwr_tests.png\")\n",
"#plt.show()\n",
"plt.close()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![](rdwr_tests.png)"
]
}
],
"metadata": {
"interpreter": {
"hash": "78f203ba605732dcd419e55e4a2fc56c1449fc8b262db510a48272adb5557637"
},
"kernelspec": {
"display_name": "Python 3.9.7 64-bit ('base': conda)",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.12"
},
"orig_nbformat": 4
},
"nbformat": 4,
"nbformat_minor": 2
}