Bill Zissimopoulos 6023efa7e6 doc: WinFsp Performance Testing
Update with new tests and analysis for 2022.
2022-06-06 00:32:15 +01:00

5.7 KiB

Performance Testing Analysis

This notebook describes the methodology for analyzing WinFsp performance.

Data Collection

Performance data is collected by running the script run-all-perf-tests.bat. This script runs a variety of performance tests against the NTFS, MEMFS and NTPTFS file systems. The tests are run a number of times (default: 3) and the results are saved in CSV files with names ntfs-N.csv, memfs-N.csv and ntptfs-N.csv (where N represents the results of test run N).

Data Loading

Data is loaded from all CSV files into a single pandas DataFrame. The resulting DataFrame has columns test, iter, ntfs, memfs, ntptfs. With multiple test runs there will be multiple time values for a test, iter, file system triple; in this case the smallest time value is entered into the DataFrame. The assumption is that even in a seemingly idle system there is some activity that affects the results; the smallest value is the preferred one to use because it reflects the time when there is less or no other system activity.

The resulting DataFrame will contain data similar to the following:

test iter ntfs memfs ntptfs
file_create_test 1000 0.20 0.06 0.28
file_open_test 1000 0.09 0.05 0.22
... ... ... ... ...
import glob, os
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

nameord = ["ntfs", "memfs", "ntptfs"]

datamap = {}
for f in sorted(glob.iglob("*.csv")):
    datamap.setdefault(f.rsplit("-", maxsplit=1)[0], []).append(f)

df = None
for n in nameord:
    ndf = None
    for f in datamap[n]:
        df0 = pd.read_csv(f, header=None, names=["test", "iter", n])
        if ndf is None:
            ndf = df0
        else:
            ndf = ndf.combine(df0, np.minimum)
    if df is None:
        df = ndf
    else:
        df = df.merge(ndf, how="left")
#df

Data Analysis

For each test a plot is drawn that shows how each file system performs in the particular test. This allows for easy comparisons between file systems for a particular test.

markermap = { "ntfs": "$\mathtt{N}$", "memfs": "$\mathtt{M}$", "ntptfs": "$\mathtt{P}$"}
for t, tdf in df.groupby("test", sort=False):
    plt.figure(figsize=(10,8), dpi=100, facecolor="white")
    plt.title(t)
    xlabel = "iter"
    if t.startswith("file_"):
        xlabel = "files"
    for n in nameord:
        tdf.plot(ax=plt.gca(), x="iter", xlabel=xlabel, y=n, ylabel="time", marker=markermap[n], ms=8)
    plt.legend(nameord)
    plt.savefig(t + ".png")
    #plt.show()
    plt.close()

File tests

File tests are tests that are performed against the hierarchical path namespace of a file system. Such tests include file_create_test, file_open_test, etc. Measured times for these tests are normalized against the ntfs time (so that the ntfs time value becomes 1) and a single aggregate plot is produced.

This allows for easy comparison between file systems across all file tests.

fileord = ["create", "open", "iter.open", "overwrite", "list", "list_single", "delete"]
fdf = pd.concat([df[df.iter == 5000], df[df.iter == 50]])
fdf.test = fdf.test.map(lambda x: x.replace("file_", "").replace("_test", ""))
fdf = fdf.set_index("test").loc[fileord]
fdf.memfs /= fdf.ntfs; fdf.ntptfs /= fdf.ntfs; fdf.ntfs = 1
plt.figure(figsize=(10,8), dpi=100, facecolor="white")
plt.suptitle("File Tests", fontweight="light", fontsize=20, y=0.95)
plt.title("(Shorter bars are better)")
fdf.plot.barh(ax=plt.gca(), y=nameord).invert_yaxis()
plt.gca().set(ylabel=None)
for container in plt.gca().containers:
    plt.gca().bar_label(container, fmt="%0.2f", padding=4.0, fontsize="xx-small")
plt.savefig("file_tests.png")
#plt.show()
plt.close()

Read/write tests

Read/write tests are file I/O tests. Such tests include rdwr_cc_write_page_test, rdwr_cc_read_page_test, etc. As before measured times for these tests are normalized against the ntfs time (so that the ntfs time value becomes 1) and a single aggregate plot is produced.

This allows for easy comparison between file systems across all read/write tests.

rdwrord = ["cc_read_page", "cc_write_page", "nc_read_page", "nc_write_page", "mmap_read", "mmap_write"]
sdf = df[df.iter == 500].copy()
sdf.test = sdf.test.map(lambda x: x.replace("rdwr_", "").replace("_test", ""))
sdf = sdf.set_index("test").loc[rdwrord]
sdf.memfs /= sdf.ntfs; sdf.ntptfs /= sdf.ntfs; sdf.ntfs = 1
plt.figure(figsize=(10,8), dpi=100, facecolor="white")
plt.suptitle("Read/Write Tests", fontweight="light", fontsize=20, y=0.95)
plt.title("(Shorter bars are better)")
sdf.plot.barh(ax=plt.gca(), y=nameord).invert_yaxis()
plt.gca().set(ylabel=None)
for container in plt.gca().containers:
    plt.gca().bar_label(container, fmt="%0.2f", padding=4.0, fontsize="xx-small")
plt.savefig("rdwr_tests.png")
#plt.show()
plt.close()