To measure the most basic performance parameters, we first start with a secure erased drive. We write over the entire LBA space with sequential writes twice, then write the capacity of the drive twice with 4K random writes. Once prepared, we run the following tests for one minute at each queue depth. The throughput tests are similar, except that we’ve included fresh performance numbers as well.
We test the max throughput numbers with sequential 128K reads and writes. We take a baseline fresh number first, then check again after the 2x random fill. Notice that the max reads are capped at 3.3GB/s.
The P320h is “limited” to around 3.3GB/s. Why? Because the PCIe Gen 2 interface is good for approximately 3.3GB/s of throughput. Since 785,000 4K IOPS equals ~3.3GB/s, the P320h is limited to 785,000 4K Random Read IOPS by the interface itself. Without the Gen 2 limitation, the P320h could probably better that. The fact that the P320h can saturate its interface with sequential and small random IOPS is truly excellent, certainly not something seen everyday.
The P320h just scales exceedingly well. 785K IOPS are achieved at a QD of 256. Below that, performance slopes downward with latency, while at a QD of 512 performance is nearly at peak. Beyond 256 commands outstanding, latency starts getting out of hand, though. For super high QD situations, the high queue depth interrupt coalescing setting is probably more appropriate.
The P320h serves up some excellent write latency. Writes also hit their maximum performance with 256 outstanding commands. The P320h’s driver is optimized to function best around QD 256, whether in Windows or Linux. If you can’t get to 256 with writes, you can still get pretty close to peak performance.
We can look at a matrix of average latency at QD 1 to get a standardized latency measurement.
We run latency measurements at 512 bytes, 4K, and 8K block sizes. For each block size, we run one minute of logging for 100% read, 65% read, and 100% write. The results fit in neatly with the previous 4K read and write charts.
Instead of measuring average latency, this chart measures maximum latency. Max latency is at or above one millisecond (1000 microseconds) for each QD 1 blocksize and read/write mix.