{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Performance Testing Analysis\n", "\n", "This notebook describes the methodology for analyzing WinFsp performance." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Data Collection\n", "\n", "Performance data is collected by running the script `run-all-perf-tests.bat`. This script runs a variety of performance tests against the NTFS, MEMFS and NTPTFS file systems. The tests are run a number of times (default: 3) and the results are saved in CSV files with names `ntfs-N.csv`, `memfs-N.csv` and `ntptfs-N.csv` (where `N` represents the results of test run `N`)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Data Loading\n", "\n", "Data is loaded from all CSV files into a single pandas `DataFrame`. The resulting `DataFrame` has columns `test`, `iter`, `ntfs`, `memfs`, `ntptfs`. With multiple test runs there will be multiple time values for a `test`, `iter`, file system triple; in this case the smallest time value is entered into the `DataFrame`. The assumption is that even in a seemingly idle system there is some activity that affects the results; the smallest value is the preferred one to use because it reflects the time when there is less or no other system activity.\n", "\n", "The resulting `DataFrame` will contain data similar to the following:\n", "\n", "| test | iter | ntfs | memfs | ntptfs |\n", "|:------------------|------:|-------:|-------:|-------:|\n", "| file_create_test | 1000 | 0.20 | 0.06 | 0.28 |\n", "| file_open_test | 1000 | 0.09 | 0.05 | 0.22 |\n", "| ... | ... | ... | ... | ... |" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import glob, os\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import pandas as pd\n", "\n", "nameord = [\"ntfs\", \"memfs\", \"ntptfs\"]\n", "\n", "datamap = {}\n", "for f in sorted(glob.iglob(\"*.csv\")):\n", " datamap.setdefault(f.rsplit(\"-\", maxsplit=1)[0], []).append(f)\n", "\n", "df = None\n", "for n in nameord:\n", " ndf = None\n", " for f in datamap[n]:\n", " df0 = pd.read_csv(f, header=None, names=[\"test\", \"iter\", n])\n", " if ndf is None:\n", " ndf = df0\n", " else:\n", " ndf = ndf.combine(df0, np.minimum)\n", " if df is None:\n", " df = ndf\n", " else:\n", " df = df.merge(ndf, how=\"left\")\n", "#df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Data Analysis\n", "\n", "For each test a plot is drawn that shows how each file system performs in the particular test. This allows for easy comparisons between file systems for a particular test." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "markermap = { \"ntfs\": \"$\\mathtt{N}$\", \"memfs\": \"$\\mathtt{M}$\", \"ntptfs\": \"$\\mathtt{P}$\"}\n", "for t, tdf in df.groupby(\"test\", sort=False):\n", " plt.figure(figsize=(10,8), dpi=100, facecolor=\"white\")\n", " plt.title(t)\n", " xlabel = \"iter\"\n", " if t.startswith(\"file_\"):\n", " xlabel = \"files\"\n", " for n in nameord:\n", " tdf.plot(ax=plt.gca(), x=\"iter\", xlabel=xlabel, y=n, ylabel=\"time\", marker=markermap[n], ms=8)\n", " plt.legend(nameord)\n", " plt.savefig(t + \".png\")\n", " #plt.show()\n", " plt.close()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![](file_create_test.png)\n", "![](file_open_test.png)\n", "![](file_overwrite_test.png)\n", "![](file_attr_test.png)\n", "![](file_list_test.png)\n", "![](file_list_single_test.png)\n", "![](file_list_none_test.png)\n", "![](file_delete_test.png)\n", "![](file_mkdir_test.png)\n", "![](file_rmdir_test.png)\n", "\n", "![](iter.file_open_test.png)\n", "![](iter.file_attr_test.png)\n", "![](iter.file_list_single_test.png)\n", "![](iter.file_list_none_test.png)\n", "\n", "![](rdwr_cc_read_large_test.png)\n", "![](rdwr_cc_read_page_test.png)\n", "![](rdwr_cc_write_large_test.png)\n", "![](rdwr_cc_write_page_test.png)\n", "![](rdwr_nc_read_large_test.png)\n", "![](rdwr_nc_read_page_test.png)\n", "![](rdwr_nc_write_large_test.png)\n", "![](rdwr_nc_write_page_test.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### File tests\n", "\n", "File tests are tests that are performed against the hierarchical path namespace of a file system. Such tests include `file_create_test`, `file_open_test`, etc. Measured times for these tests are normalized against the `ntfs` time (so that the `ntfs` time value becomes 1) and a single aggregate plot is produced.\n", "\n", "This allows for easy comparison between file systems across all file tests." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fileord = [\"create\", \"open\", \"iter.open\", \"overwrite\", \"list\", \"list_single\", \"delete\"]\n", "fdf = pd.concat([df[df.iter == 5000], df[df.iter == 50]])\n", "fdf.test = fdf.test.map(lambda x: x.replace(\"file_\", \"\").replace(\"_test\", \"\"))\n", "fdf = fdf.set_index(\"test\").loc[fileord]\n", "fdf.memfs /= fdf.ntfs; fdf.ntptfs /= fdf.ntfs; fdf.ntfs = 1\n", "plt.figure(figsize=(10,8), dpi=100, facecolor=\"white\")\n", "plt.suptitle(\"File Tests\", fontweight=\"light\", fontsize=20, y=0.95)\n", "plt.title(\"(Shorter bars are better)\")\n", "fdf.plot.barh(ax=plt.gca(), y=nameord).invert_yaxis()\n", "plt.gca().set(ylabel=None)\n", "for container in plt.gca().containers:\n", " plt.gca().bar_label(container, fmt=\"%0.2f\", padding=4.0, fontsize=\"xx-small\")\n", "plt.savefig(\"file_tests.png\")\n", "#plt.show()\n", "plt.close()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![](file_tests.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Read/write tests\n", "\n", "Read/write tests are file I/O tests. Such tests include `rdwr_cc_write_page_test`, `rdwr_cc_read_page_test`, etc. As before measured times for these tests are normalized against the `ntfs` time (so that the `ntfs` time value becomes 1) and a single aggregate plot is produced.\n", "\n", "This allows for easy comparison between file systems across all read/write tests." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "rdwrord = [\"cc_read_page\", \"cc_write_page\", \"nc_read_page\", \"nc_write_page\", \"mmap_read\", \"mmap_write\"]\n", "sdf = df[df.iter == 500].copy()\n", "sdf.test = sdf.test.map(lambda x: x.replace(\"rdwr_\", \"\").replace(\"_test\", \"\"))\n", "sdf = sdf.set_index(\"test\").loc[rdwrord]\n", "sdf.memfs /= sdf.ntfs; sdf.ntptfs /= sdf.ntfs; sdf.ntfs = 1\n", "plt.figure(figsize=(10,8), dpi=100, facecolor=\"white\")\n", "plt.suptitle(\"Read/Write Tests\", fontweight=\"light\", fontsize=20, y=0.95)\n", "plt.title(\"(Shorter bars are better)\")\n", "sdf.plot.barh(ax=plt.gca(), y=nameord).invert_yaxis()\n", "plt.gca().set(ylabel=None)\n", "for container in plt.gca().containers:\n", " plt.gca().bar_label(container, fmt=\"%0.2f\", padding=4.0, fontsize=\"xx-small\")\n", "plt.savefig(\"rdwr_tests.png\")\n", "#plt.show()\n", "plt.close()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![](rdwr_tests.png)" ] } ], "metadata": { "interpreter": { "hash": "78f203ba605732dcd419e55e4a2fc56c1449fc8b262db510a48272adb5557637" }, "kernelspec": { "display_name": "Python 3.9.7 64-bit ('base': conda)", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.12" }, "orig_nbformat": 4 }, "nbformat": 4, "nbformat_minor": 2 }