Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How many instructions does Linux kernel need in order to handle an interrupt on an arm cortex A9?

I would like to estimate the amount of opcodes it takes a ARM cortex A9 single core to handle an IRQ.

Assuming I work with Linux kernel 3.4, how many opcodes it takes to call the irq and execute the irq_handler ?

like image 219
0x90 Avatar asked Dec 11 '22 18:12

0x90


2 Answers

You question is related how to calculate the interrupt latency of Linux. At least you might be interested in how long it takes before your interrupt even starts. We will ignore this aspect of irqs here.

A simple way is to toggle a GPIO and use a scope to measure the interrupt. You may even toggle the GPIO multiple times to see the time different phases take. This Window CE link shows an example measuring for latency. Some interrupt controller (such as the IMX) have I/O multiplexing modes where an interrupt number will raise/lower a particular I/O line. Alternatively, you can add code to toggle the line (see below for routines).

The source for the primary interrupt handling is in entry-armv.S. There are macros defined for the interrupt controller you use and these depend on the .config file. For instance, there is pre-emptive interrupts, multi-interrupt controllers, SMP, etc. The primary vectors are defined at the bottom of entry-armv.S. The general gist is that the current operating mode is inspected and then either __irq_usr or __irq_svc is taken. These routines have a different pre-ample to store state, but they both end up calling the irq_handler macro. The _irq_usr has stuff about cmpxchg, but if you specify and ARM cortex in your .config, this won't apply. The main difference will be the possible context switch after the IRQ occurs in user mode. Your machine defines mach/entry-macro.S which are assembler macros to access the interrupt controller and get an interrupt number. It then jumps to generic irq handling code in the top level kernel directory.

So the second way would be to inspect the code and calculate it directly. This is probably easier if you look at the source, compile your kernel and then do an objdump --disassemble on the vmlinux image and look for these symbols. You will see the irq_handler macro expanded and it should jump to your IRQ code eventually.

As you can see from the source, there is also TRACE_IRQFLAGS. You can check to see if this is available on the Cortex A9 you are using with make menuconfig (and type /TRACE_IRQFLAGS). I don't know if it is available or not.

There are variations such as,

  1. Interrupt from User/SVC mode.
  2. Other interrupt currently running.
  3. Code that is interrupted (such as stm/ldm) may take some time to complete.
  4. Page faults in your ISR. Some Alsa drivers can fault with unallocated pages in at least some Linux versions.
  5. Conditionals in your ISR.

Measuring on a scope will show the jitter in IRQ servicing. Examining the instructions will generally show that the IRQ may never be serviced; for example if higher priority interrupts constantly pre-empt/prevent the IRQ. Probably you need to do both to fully optimize for a hard deadline.

Often you don't care how long the whole IRQ takes but the time between the IRQ line being raised and writing/reading some peripheral register. For instance, a FIFO may have limited depth and if the latency between the IRQ occurring and reading the FIFO register is greater than FIFO_Size x BPS, then you have issues with the FIFO overflowing.

The FIQ infra-structure is a lot faster, but the kernel facilities you can use are far less!

Edit: The Cortex A9 technical reference has instruction counts in appendix B. Most ARM instruction are a single cycle on most architectures, except memory load/store, multiples and branches. Follow the 3rd and 4th paragraphs above to find the complete instruction path to handle a Linux interrupt for your configuration and just add it up; for an estimate (as the original question asks) you can just count the instructions as they are generally a single cycle.

like image 181
artless noise Avatar answered Feb 05 '23 17:02

artless noise


Whilst you can calculate the theoretical minimum number of core cycles by inspection of the source code, the number actually taken is far less certain due to the effects of caching, memory and memory controller performance, what the other core is doing at the time and various other factors dependant on the micro-architecture of the ARM processor in question.

I suspect you would be better off measuring the actual interrupt latency performance of your system, either using a digital 'scope or performance counters.

Of course, for hard real-time applications, you need to know the worst case interrupt latency - which includes the worst case of all of these factors.

like image 36
marko Avatar answered Feb 05 '23 17:02

marko