mirror of
https://github.com/winfsp/winfsp.git
synced 2025-04-22 08:23:05 -05:00
239 lines
8.4 KiB
Plaintext
239 lines
8.4 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Performance Testing Analysis\n",
|
|
"\n",
|
|
"This notebook describes the methodology for analyzing WinFsp performance."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Data Collection\n",
|
|
"\n",
|
|
"Performance data is collected by running the script `run-all-perf-tests.bat`. This script runs a variety of performance tests against the NTFS, MEMFS and NTPTFS file systems. The tests are run a number of times (default: 3) and the results are saved in CSV files with names `ntfs-N.csv`, `memfs-N.csv` and `ntptfs-N.csv` (where `N` represents the results of test run `N`)."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Data Loading\n",
|
|
"\n",
|
|
"Data is loaded from all CSV files into a single pandas `DataFrame`. The resulting `DataFrame` has columns `test`, `iter`, `ntfs`, `memfs`, `ntptfs`. With multiple test runs there will be multiple time values for a `test`, `iter`, file system triple; in this case the smallest time value is entered into the `DataFrame`. The assumption is that even in a seemingly idle system there is some activity that affects the results; the smallest value is the preferred one to use because it reflects the time when there is less or no other system activity.\n",
|
|
"\n",
|
|
"The resulting `DataFrame` will contain data similar to the following:\n",
|
|
"\n",
|
|
"| test | iter | ntfs | memfs | ntptfs |\n",
|
|
"|:------------------|------:|-------:|-------:|-------:|\n",
|
|
"| file_create_test | 1000 | 0.20 | 0.06 | 0.28 |\n",
|
|
"| file_open_test | 1000 | 0.09 | 0.05 | 0.22 |\n",
|
|
"| ... | ... | ... | ... | ... |"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import glob, os\n",
|
|
"import matplotlib.pyplot as plt\n",
|
|
"import numpy as np\n",
|
|
"import pandas as pd\n",
|
|
"\n",
|
|
"nameord = [\"ntfs\", \"memfs\", \"ntptfs\"]\n",
|
|
"\n",
|
|
"datamap = {}\n",
|
|
"for f in sorted(glob.iglob(\"*.csv\")):\n",
|
|
" datamap.setdefault(f.rsplit(\"-\", maxsplit=1)[0], []).append(f)\n",
|
|
"\n",
|
|
"df = None\n",
|
|
"for n in nameord:\n",
|
|
" ndf = None\n",
|
|
" for f in datamap[n]:\n",
|
|
" df0 = pd.read_csv(f, header=None, names=[\"test\", \"iter\", n])\n",
|
|
" if ndf is None:\n",
|
|
" ndf = df0\n",
|
|
" else:\n",
|
|
" ndf = ndf.combine(df0, np.minimum)\n",
|
|
" if df is None:\n",
|
|
" df = ndf\n",
|
|
" else:\n",
|
|
" df = df.merge(ndf, how=\"left\")\n",
|
|
"#df"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Data Analysis\n",
|
|
"\n",
|
|
"For each test a plot is drawn that shows how each file system performs in the particular test. This allows for easy comparisons between file systems for a particular test."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"markermap = { \"ntfs\": \"$\\mathtt{N}$\", \"memfs\": \"$\\mathtt{M}$\", \"ntptfs\": \"$\\mathtt{P}$\"}\n",
|
|
"for t, tdf in df.groupby(\"test\", sort=False):\n",
|
|
" plt.figure(figsize=(10,8), dpi=100, facecolor=\"white\")\n",
|
|
" plt.title(t)\n",
|
|
" xlabel = \"iter\"\n",
|
|
" if t.startswith(\"file_\"):\n",
|
|
" xlabel = \"files\"\n",
|
|
" for n in nameord:\n",
|
|
" tdf.plot(ax=plt.gca(), x=\"iter\", xlabel=xlabel, y=n, ylabel=\"time\", marker=markermap[n], ms=8)\n",
|
|
" plt.legend(nameord)\n",
|
|
" plt.savefig(t + \".png\")\n",
|
|
" #plt.show()\n",
|
|
" plt.close()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"\n",
|
|
"\n",
|
|
"\n",
|
|
"\n",
|
|
"\n",
|
|
"\n",
|
|
"\n",
|
|
"\n",
|
|
"\n",
|
|
"\n",
|
|
"\n",
|
|
"\n",
|
|
"\n",
|
|
"\n",
|
|
"\n",
|
|
"\n",
|
|
"\n",
|
|
"\n",
|
|
"\n",
|
|
"\n",
|
|
"\n",
|
|
"\n",
|
|
"\n",
|
|
""
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### File tests\n",
|
|
"\n",
|
|
"File tests are tests that are performed against the hierarchical path namespace of a file system. Such tests include `file_create_test`, `file_open_test`, etc. Measured times for these tests are normalized against the `ntfs` time (so that the `ntfs` time value becomes 1) and a single aggregate plot is produced.\n",
|
|
"\n",
|
|
"This allows for easy comparison between file systems across all file tests."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"fileord = [\"create\", \"open\", \"iter.open\", \"overwrite\", \"list\", \"list_single\", \"delete\"]\n",
|
|
"fdf = pd.concat([df[df.iter == 5000], df[df.iter == 50]])\n",
|
|
"fdf.test = fdf.test.map(lambda x: x.replace(\"file_\", \"\").replace(\"_test\", \"\"))\n",
|
|
"fdf = fdf.set_index(\"test\").loc[fileord]\n",
|
|
"fdf.memfs /= fdf.ntfs; fdf.ntptfs /= fdf.ntfs; fdf.ntfs = 1\n",
|
|
"plt.figure(figsize=(10,8), dpi=100, facecolor=\"white\")\n",
|
|
"plt.suptitle(\"File Tests\", fontweight=\"light\", fontsize=20, y=0.95)\n",
|
|
"plt.title(\"(Shorter bars are better)\")\n",
|
|
"fdf.plot.barh(ax=plt.gca(), y=nameord).invert_yaxis()\n",
|
|
"plt.gca().set(ylabel=None)\n",
|
|
"for container in plt.gca().containers:\n",
|
|
" plt.gca().bar_label(container, fmt=\"%0.2f\", padding=4.0, fontsize=\"xx-small\")\n",
|
|
"plt.savefig(\"file_tests.png\")\n",
|
|
"#plt.show()\n",
|
|
"plt.close()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
""
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Read/write tests\n",
|
|
"\n",
|
|
"Read/write tests are file I/O tests. Such tests include `rdwr_cc_write_page_test`, `rdwr_cc_read_page_test`, etc. As before measured times for these tests are normalized against the `ntfs` time (so that the `ntfs` time value becomes 1) and a single aggregate plot is produced.\n",
|
|
"\n",
|
|
"This allows for easy comparison between file systems across all read/write tests."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"rdwrord = [\"cc_read_page\", \"cc_write_page\", \"nc_read_page\", \"nc_write_page\", \"mmap_read\", \"mmap_write\"]\n",
|
|
"sdf = df[df.iter == 500].copy()\n",
|
|
"sdf.test = sdf.test.map(lambda x: x.replace(\"rdwr_\", \"\").replace(\"_test\", \"\"))\n",
|
|
"sdf = sdf.set_index(\"test\").loc[rdwrord]\n",
|
|
"sdf.memfs /= sdf.ntfs; sdf.ntptfs /= sdf.ntfs; sdf.ntfs = 1\n",
|
|
"plt.figure(figsize=(10,8), dpi=100, facecolor=\"white\")\n",
|
|
"plt.suptitle(\"Read/Write Tests\", fontweight=\"light\", fontsize=20, y=0.95)\n",
|
|
"plt.title(\"(Shorter bars are better)\")\n",
|
|
"sdf.plot.barh(ax=plt.gca(), y=nameord).invert_yaxis()\n",
|
|
"plt.gca().set(ylabel=None)\n",
|
|
"for container in plt.gca().containers:\n",
|
|
" plt.gca().bar_label(container, fmt=\"%0.2f\", padding=4.0, fontsize=\"xx-small\")\n",
|
|
"plt.savefig(\"rdwr_tests.png\")\n",
|
|
"#plt.show()\n",
|
|
"plt.close()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
""
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"interpreter": {
|
|
"hash": "78f203ba605732dcd419e55e4a2fc56c1449fc8b262db510a48272adb5557637"
|
|
},
|
|
"kernelspec": {
|
|
"display_name": "Python 3.9.7 64-bit ('base': conda)",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.8.12"
|
|
},
|
|
"orig_nbformat": 4
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 2
|
|
}
|