I looking for the info that can help in estimating interrupt latencies on x86 CPUs. The very usefull paper was found at "datasheets.chipdb.org/Intel/x86/386/technote/2153.pdf". But this paper opened a very important question for me: how can be defined the delay provided by waiting of completion of the current instruction? I mean delay between recognition of the INTR signal and executing of INTR micro-code. As I remember, the Intel Software developer manual also tells something about waiting of completion of the currently executing instruction. But it also tells something about that the some of the instructions can be interrupted in progress. And the main question is: how the maximum completion instruction waiting length can be defined for the particular processor. Estimation in core ticks and memory access operations is needed, not in seconds or microseconds. The cache and TLD misses, and other such stuff that can influence to the waiting should be considered.
This estimation is needed to investigate the possibility of implementing small critical sections that will not influence to the interrupt latency. To achive this the length of the critical section must be below or equal to the length of the most longest uninterruptable instruction of CPU.
Any kinds of help are very welcome. If you know some papers that can be helpfull, please, share the links to it.
If agner fog's optimization manuals (supplimented with the intel developer manuals) don't have anything, its unlikely anyone/anything else will(save for some internal intel/amd data): http://www.agner.org/optimize/
In general, there is no guaranteed upper bound on interrupt latency. Consider the following example:
sti
instruction, which sets the IF flag.hlt
instruction.In this case, the processor will not handle the interrupt until an unmaskable interrupt occurs to wake up the processor and the IF flag is cleared to enable handling maskable interrupts.
The interrupt latency for any interrupt (including unmaskable interrupts) can be in the order of hundreds of microseconds if all the processors that are supposed to handle the interrupt are in a very deep sleep state. On my Haswell processor, the wakeup latency of the C7 state is 133 usecs. If this is an issue for you, you can use the Linux kernel parameter intel_idle.max_cstate
(in case the intel_idle driver is used, which is the default on Intel processors) or processor.max_cstate
(for the acpi_idle driver) to limit the deepest C-state. You can tell the kernel to never put any core to sleep using idle=poll
, which may minimize the interrupt latency on an idle core, assuming of course that the frequency is not reduced due to thermal throttling. Using a polling loop also reduces the maximum turbo frequency of all cores, which may reduce overall performance of the system.
On an active core (in state C0), a hardware interrupt is only accepted when the core is an interruptible state. This state occurs at instruction boundaries, except for string instructions, which are interruptible. Intel does not provide an upper bound on the number of instructions that are retired before a pending interrupt is accepted. A reasonable implementation may stop issuing uops into the ROB (at an instruction boundary) and wait until all uops in the ROB retire before beginning the execution of the microcode routine for invoking an interrupt handler. In such an implementation, the interrupt latency depends on the time it takes to retire all of the pending uops. High latency instructions such as loads, complex floating-point arithmetic, and locked instructions can easily make the interrupt latency in the order of hundreds of nanoseconds. However, if one of the pending uops requires a microcode assist for any reason (or some specific reasons), the processor may choose to flush the instruction and all later instructions, instead of invoking the assist. This implementation improves performance and power consumption at cost of increased interrupt latency.
In another implementation tuned for minimizing interrupt latency, all in-flight instructions are immediately flushed without retiring anything. But all of these flushed instructions which went through the pipeline and some of which might have already been completed need to be fetched and go through the pipeline again when the interrupt handler returns. This results in reduced performance and increased power consumption.
Hardware interrupts drain the store buffer and the write-combining buffers on Intel and AMD x86 processors. See: Interrupting an assembly instruction while it is operating.
A paper from Intel titled Reducing Interrupt Latency Through the Use of Message Signaled Interrupts discusses a methodology to measure the latency of an interrupt from a PCIe device. This paper uses the term "interrupt latency" to mean the same thing as "interrupt response time" from the paper you mentioned. You need to somehow take a timestamp at the time the interrupt reaches the processor and then another timestamp at the very beginning of the interrupt handler. An approximation of the interrupt latency can be calculated by subtracting the two. The problem is of course getting the first timestamp (also in a way that is comparable to the second timestamp). The Intel paper proposes to use a PCIe analyzer, which consists of a PCIe device and an application that records all PCIe traffic with timestamps between the device and the CPU. They use a device driver to write to an MMIO location mapped to the device from the interrupt handler to create the second timestamp.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With