Bill Zissimopoulos 6023efa7e6 doc: WinFsp Performance Testing
Update with new tests and analysis for 2022.
2022-06-06 00:32:15 +01:00

149 lines
5.7 KiB
Markdown

# Performance Testing Analysis
This notebook describes the methodology for analyzing WinFsp performance.
## Data Collection
Performance data is collected by running the script `run-all-perf-tests.bat`. This script runs a variety of performance tests against the NTFS, MEMFS and NTPTFS file systems. The tests are run a number of times (default: 3) and the results are saved in CSV files with names `ntfs-N.csv`, `memfs-N.csv` and `ntptfs-N.csv` (where `N` represents the results of test run `N`).
## Data Loading
Data is loaded from all CSV files into a single pandas `DataFrame`. The resulting `DataFrame` has columns `test`, `iter`, `ntfs`, `memfs`, `ntptfs`. With multiple test runs there will be multiple time values for a `test`, `iter`, file system triple; in this case the smallest time value is entered into the `DataFrame`. The assumption is that even in a seemingly idle system there is some activity that affects the results; the smallest value is the preferred one to use because it reflects the time when there is less or no other system activity.
The resulting `DataFrame` will contain data similar to the following:
| test | iter | ntfs | memfs | ntptfs |
|:------------------|------:|-------:|-------:|-------:|
| file_create_test | 1000 | 0.20 | 0.06 | 0.28 |
| file_open_test | 1000 | 0.09 | 0.05 | 0.22 |
| ... | ... | ... | ... | ... |
```python
import glob, os
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
nameord = ["ntfs", "memfs", "ntptfs"]
datamap = {}
for f in sorted(glob.iglob("*.csv")):
datamap.setdefault(f.rsplit("-", maxsplit=1)[0], []).append(f)
df = None
for n in nameord:
ndf = None
for f in datamap[n]:
df0 = pd.read_csv(f, header=None, names=["test", "iter", n])
if ndf is None:
ndf = df0
else:
ndf = ndf.combine(df0, np.minimum)
if df is None:
df = ndf
else:
df = df.merge(ndf, how="left")
#df
```
## Data Analysis
For each test a plot is drawn that shows how each file system performs in the particular test. This allows for easy comparisons between file systems for a particular test.
```python
markermap = { "ntfs": "$\mathtt{N}$", "memfs": "$\mathtt{M}$", "ntptfs": "$\mathtt{P}$"}
for t, tdf in df.groupby("test", sort=False):
plt.figure(figsize=(10,8), dpi=100, facecolor="white")
plt.title(t)
xlabel = "iter"
if t.startswith("file_"):
xlabel = "files"
for n in nameord:
tdf.plot(ax=plt.gca(), x="iter", xlabel=xlabel, y=n, ylabel="time", marker=markermap[n], ms=8)
plt.legend(nameord)
plt.savefig(t + ".png")
#plt.show()
plt.close()
```
![](file_create_test.png)
![](file_open_test.png)
![](file_overwrite_test.png)
![](file_attr_test.png)
![](file_list_test.png)
![](file_list_single_test.png)
![](file_list_none_test.png)
![](file_delete_test.png)
![](file_mkdir_test.png)
![](file_rmdir_test.png)
![](iter.file_open_test.png)
![](iter.file_attr_test.png)
![](iter.file_list_single_test.png)
![](iter.file_list_none_test.png)
![](rdwr_cc_read_large_test.png)
![](rdwr_cc_read_page_test.png)
![](rdwr_cc_write_large_test.png)
![](rdwr_cc_write_page_test.png)
![](rdwr_nc_read_large_test.png)
![](rdwr_nc_read_page_test.png)
![](rdwr_nc_write_large_test.png)
![](rdwr_nc_write_page_test.png)
### File tests
File tests are tests that are performed against the hierarchical path namespace of a file system. Such tests include `file_create_test`, `file_open_test`, etc. Measured times for these tests are normalized against the `ntfs` time (so that the `ntfs` time value becomes 1) and a single aggregate plot is produced.
This allows for easy comparison between file systems across all file tests.
```python
fileord = ["create", "open", "iter.open", "overwrite", "list", "list_single", "delete"]
fdf = pd.concat([df[df.iter == 5000], df[df.iter == 50]])
fdf.test = fdf.test.map(lambda x: x.replace("file_", "").replace("_test", ""))
fdf = fdf.set_index("test").loc[fileord]
fdf.memfs /= fdf.ntfs; fdf.ntptfs /= fdf.ntfs; fdf.ntfs = 1
plt.figure(figsize=(10,8), dpi=100, facecolor="white")
plt.suptitle("File Tests", fontweight="light", fontsize=20, y=0.95)
plt.title("(Shorter bars are better)")
fdf.plot.barh(ax=plt.gca(), y=nameord).invert_yaxis()
plt.gca().set(ylabel=None)
for container in plt.gca().containers:
plt.gca().bar_label(container, fmt="%0.2f", padding=4.0, fontsize="xx-small")
plt.savefig("file_tests.png")
#plt.show()
plt.close()
```
![](file_tests.png)
### Read/write tests
Read/write tests are file I/O tests. Such tests include `rdwr_cc_write_page_test`, `rdwr_cc_read_page_test`, etc. As before measured times for these tests are normalized against the `ntfs` time (so that the `ntfs` time value becomes 1) and a single aggregate plot is produced.
This allows for easy comparison between file systems across all read/write tests.
```python
rdwrord = ["cc_read_page", "cc_write_page", "nc_read_page", "nc_write_page", "mmap_read", "mmap_write"]
sdf = df[df.iter == 500].copy()
sdf.test = sdf.test.map(lambda x: x.replace("rdwr_", "").replace("_test", ""))
sdf = sdf.set_index("test").loc[rdwrord]
sdf.memfs /= sdf.ntfs; sdf.ntptfs /= sdf.ntfs; sdf.ntfs = 1
plt.figure(figsize=(10,8), dpi=100, facecolor="white")
plt.suptitle("Read/Write Tests", fontweight="light", fontsize=20, y=0.95)
plt.title("(Shorter bars are better)")
sdf.plot.barh(ax=plt.gca(), y=nameord).invert_yaxis()
plt.gca().set(ylabel=None)
for container in plt.gca().containers:
plt.gca().bar_label(container, fmt="%0.2f", padding=4.0, fontsize="xx-small")
plt.savefig("rdwr_tests.png")
#plt.show()
plt.close()
```
![](rdwr_tests.png)