Some time ago I observed high IO latency on an Oracle Solaris x86 server which hosts Oracle databases. I used DTrace to narrow down the layer on the IO stack which had been causing the problem. It turned out to be the Emulex HBA driver.
First, I analyzed the the driver’s source code of illumos to get an idea about the workflow which is used to process fiber channel packets. Then, using the acquired knowledge I developed a DTrace script to measure the time the fiber channel packages spent in each part of the driver’s code.
This blog post is an index page to the previously published articles.
The blog post describes the methodology to measure the time between the generic fiber channel driver makes an IO request and consumes the packet. The time elapsed includes the SAN response time and the processing time in the HBA driver.
In the second blog post I drilled down and identified three different stages of processing a packet within the HBA driver: I/O start, I/O interrupt and I/O done.
In the third blog post I provided the DTrace script for measuring the time spent in each of the processing stages. Since, in general, the packet gets consumed by a different kernel thread than the one that submitted the I/O request, I’ve used the memory address of the fiber channel packet for identifying the packet after the interrupt thread puts it into the completion queue.
Part 4 – Comparing Latency on Emulex and QLogic HBA Driver
Finally, I shared my observations on the cause of the latency in the Emulex driver. The problem is, that there is a delay in consuming the packet in the completion thread. I also compared the performance of the Emulex and QLogic drivers.
Related reading
If you’re a Solaris x86 ZFS user and you’re bothered by IO outliers you might be interested in the following blog posts, where I investigated the impact of ZFS ARC maintenance on the performance: