winfsp/doc/WinFsp-Performance-Testing/WinFsp-Performance-Testing-Analysis.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Performance Testing Analysis\n",
    "\n",
    "This notebook describes the methodology for analyzing WinFsp performance."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Data Collection\n",
    "\n",
    "Performance data is collected by running the script `run-all-perf-tests.bat`. This script runs a variety of performance tests against the NTFS, MEMFS and NTPTFS file systems. The tests are run a number of times (default: 3) and the results are saved in CSV files with names `ntfs-N.csv`, `memfs-N.csv` and `ntptfs-N.csv` (where `N` represents the results of test run `N`)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Data Loading\n",
    "\n",
    "Data is loaded from all CSV files into a single pandas `DataFrame`. The resulting `DataFrame` has columns `test`, `iter`, `ntfs`, `memfs`, `ntptfs`. With multiple test runs there will be multiple time values for a `test`, `iter`, file system triple; in this case the smallest time value is entered into the `DataFrame`. The assumption is that even in a seemingly idle system there is some activity that affects the results; the smallest value is the preferred one to use because it reflects the time when there is less or no other system activity.\n",
    "\n",
    "The resulting `DataFrame` will contain data similar to the following:\n",
    "\n",
    "| test              | iter  |  ntfs  | memfs  | ntptfs |\n",
    "|:------------------|------:|-------:|-------:|-------:|\n",
    "| file_create_test  | 1000  |  0.20  |  0.06  |  0.28  |\n",
    "| file_open_test    | 1000  |  0.09  |  0.05  |  0.22  |\n",
    "| ...               |  ...  |   ...  |   ...  |   ...  |"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import glob, os\n",
    "import matplotlib.pyplot as plt\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "\n",
    "nameord = [\"ntfs\", \"memfs\", \"ntptfs\"]\n",
    "\n",
    "datamap = {}\n",
    "for f in sorted(glob.iglob(\"*.csv\")):\n",
    "    datamap.setdefault(f.rsplit(\"-\", maxsplit=1)[0], []).append(f)\n",
    "\n",
    "df = None\n",
    "for n in nameord:\n",
    "    ndf = None\n",
    "    for f in datamap[n]:\n",
    "        df0 = pd.read_csv(f, header=None, names=[\"test\", \"iter\", n])\n",
    "        if ndf is None:\n",
    "            ndf = df0\n",
    "        else:\n",
    "            ndf = ndf.combine(df0, np.minimum)\n",
    "    if df is None:\n",
    "        df = ndf\n",
    "    else:\n",
    "        df = df.merge(ndf, how=\"left\")\n",
    "#df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Data Analysis\n",
    "\n",
    "For each test a plot is drawn that shows how each file system performs in the particular test. This allows for easy comparisons between file systems for a particular test."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "markermap = { \"ntfs\": \"$\\mathtt{N}$\", \"memfs\": \"$\\mathtt{M}$\", \"ntptfs\": \"$\\mathtt{P}$\"}\n",
    "for t, tdf in df.groupby(\"test\", sort=False):\n",
    "    plt.figure(figsize=(10,8), dpi=100, facecolor=\"white\")\n",
    "    plt.title(t)\n",
    "    xlabel = \"iter\"\n",
    "    if t.startswith(\"file_\"):\n",
    "        xlabel = \"files\"\n",
    "    for n in nameord:\n",
    "        tdf.plot(ax=plt.gca(), x=\"iter\", xlabel=xlabel, y=n, ylabel=\"time\", marker=markermap[n], ms=8)\n",
    "    plt.legend(nameord)\n",
    "    plt.savefig(t + \".png\")\n",
    "    #plt.show()\n",
    "    plt.close()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![](file_create_test.png)\n",
    "![](file_open_test.png)\n",
    "![](file_overwrite_test.png)\n",
    "![](file_attr_test.png)\n",
    "![](file_list_test.png)\n",
    "![](file_list_single_test.png)\n",
    "![](file_list_none_test.png)\n",
    "![](file_delete_test.png)\n",
    "![](file_mkdir_test.png)\n",
    "![](file_rmdir_test.png)\n",
    "\n",
    "![](iter.file_open_test.png)\n",
    "![](iter.file_attr_test.png)\n",
    "![](iter.file_list_single_test.png)\n",
    "![](iter.file_list_none_test.png)\n",
    "\n",
    "![](rdwr_cc_read_large_test.png)\n",
    "![](rdwr_cc_read_page_test.png)\n",
    "![](rdwr_cc_write_large_test.png)\n",
    "![](rdwr_cc_write_page_test.png)\n",
    "![](rdwr_nc_read_large_test.png)\n",
    "![](rdwr_nc_read_page_test.png)\n",
    "![](rdwr_nc_write_large_test.png)\n",
    "![](rdwr_nc_write_page_test.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### File tests\n",
    "\n",
    "File tests are tests that are performed against the hierarchical path namespace of a file system. Such tests include `file_create_test`, `file_open_test`, etc. Measured times for these tests are normalized against the `ntfs` time (so that the `ntfs` time value becomes 1) and a single aggregate plot is produced.\n",
    "\n",
    "This allows for easy comparison between file systems across all file tests."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fileord = [\"create\", \"open\", \"iter.open\", \"overwrite\", \"list\", \"list_single\", \"delete\"]\n",
    "fdf = pd.concat([df[df.iter == 5000], df[df.iter == 50]])\n",
    "fdf.test = fdf.test.map(lambda x: x.replace(\"file_\", \"\").replace(\"_test\", \"\"))\n",
    "fdf = fdf.set_index(\"test\").loc[fileord]\n",
    "fdf.memfs /= fdf.ntfs; fdf.ntptfs /= fdf.ntfs; fdf.ntfs = 1\n",
    "plt.figure(figsize=(10,8), dpi=100, facecolor=\"white\")\n",
    "plt.suptitle(\"File Tests\", fontweight=\"light\", fontsize=20, y=0.95)\n",
    "plt.title(\"(Shorter bars are better)\")\n",
    "fdf.plot.barh(ax=plt.gca(), y=nameord).invert_yaxis()\n",
    "plt.gca().set(ylabel=None)\n",
    "for container in plt.gca().containers:\n",
    "    plt.gca().bar_label(container, fmt=\"%0.2f\", padding=4.0, fontsize=\"xx-small\")\n",
    "plt.savefig(\"file_tests.png\")\n",
    "#plt.show()\n",
    "plt.close()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![](file_tests.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Read/write tests\n",
    "\n",
    "Read/write tests are file I/O tests. Such tests include `rdwr_cc_write_page_test`, `rdwr_cc_read_page_test`, etc. As before measured times for these tests are normalized against the `ntfs` time (so that the `ntfs` time value becomes 1) and a single aggregate plot is produced.\n",
    "\n",
    "This allows for easy comparison between file systems across all read/write tests."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "rdwrord = [\"cc_read_page\", \"cc_write_page\", \"nc_read_page\", \"nc_write_page\", \"mmap_read\", \"mmap_write\"]\n",
    "sdf = df[df.iter == 500].copy()\n",
    "sdf.test = sdf.test.map(lambda x: x.replace(\"rdwr_\", \"\").replace(\"_test\", \"\"))\n",
    "sdf = sdf.set_index(\"test\").loc[rdwrord]\n",
    "sdf.memfs /= sdf.ntfs; sdf.ntptfs /= sdf.ntfs; sdf.ntfs = 1\n",
    "plt.figure(figsize=(10,8), dpi=100, facecolor=\"white\")\n",
    "plt.suptitle(\"Read/Write Tests\", fontweight=\"light\", fontsize=20, y=0.95)\n",
    "plt.title(\"(Shorter bars are better)\")\n",
    "sdf.plot.barh(ax=plt.gca(), y=nameord).invert_yaxis()\n",
    "plt.gca().set(ylabel=None)\n",
    "for container in plt.gca().containers:\n",
    "    plt.gca().bar_label(container, fmt=\"%0.2f\", padding=4.0, fontsize=\"xx-small\")\n",
    "plt.savefig(\"rdwr_tests.png\")\n",
    "#plt.show()\n",
    "plt.close()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![](rdwr_tests.png)"
   ]
  }
 ],
 "metadata": {
  "interpreter": {
   "hash": "78f203ba605732dcd419e55e4a2fc56c1449fc8b262db510a48272adb5557637"
  },
  "kernelspec": {
   "display_name": "Python 3.9.7 64-bit ('base': conda)",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.12"
  },
  "orig_nbformat": 4
 },
 "nbformat": 4,
 "nbformat_minor": 2
}