I would like to count the number of Instructions per Cycle executed on an ARM cortex-M4 (or cortex-M3) processor.
What it's needed is: number of instructions (executed at runtime) of the code I want to profile and number of cycles that the code takes to execute.
1 - Number of Cycles
Use the cycle counter is quite easy and straightforward.
volatile unsigned int *DWT_CYCCNT ;
volatile unsigned int *DWT_CONTROL ;
volatile unsigned int *SCB_DEMCR ;
void reset_timer(){
DWT_CYCCNT = (int *)0xE0001004; //address of the register
DWT_CONTROL = (int *)0xE0001000; //address of the register
SCB_DEMCR = (int *)0xE000EDFC; //address of the register
*SCB_DEMCR = *SCB_DEMCR | 0x01000000;
*DWT_CYCCNT = 0; // reset the counter
*DWT_CONTROL = 0;
}
void start_timer(){
*DWT_CONTROL = *DWT_CONTROL | 1 ; // enable the counter
}
void stop_timer(){
*DWT_CONTROL = *DWT_CONTROL | 0 ; // disable the counter
}
unsigned int getCycles(){
return *DWT_CYCCNT;
}
main(){
....
reset_timer(); //reset timer
start_timer(); //start timer
//Code to profile
...
myFunction();
...
stop_timer(); //stop timer
numCycles = getCycles(); //read number of cycles
...
}
2 - Number of Instructions
I found some documentation surfing the internet to count the number of instructions executed by the arm cortex-M3 and cortex-M4 (link):
# instructions = CYCCNT - CPICNT - EXCCNT - SLEEPCNT - LSUCNT + FOLDCNT
The registers that they mention are documented here (from page 11-13) and these are the memory addresses to access them:
DWT_CYCCNT = 0xE0001004
DWT_CONTROL = 0xE0001000
SCB_DEMCR = 0xE000EDFC
DWT_CPICNT = 0xE0001008
DWT_EXCCNT = 0xE000100C
DWT_SLEEPCNT = 0xE0001010
DWT_LSUCNT = 0xE0001014
DWT_FOLDCNT = 0xE0001018
The DWT_CONTROL register is used to enable counters, especially cycle counter as documented here.
But when I tried to put all together to count the number of instructions executed per cycle I didn't succeed.
Here there is a small guide on how to use them from gdb.
What is not easy is that some registers are 8 bit registers (DWT_CPICNT, DWT_EXCCNT, DWT_SLEEPCNT, DWT_LSUCNT, DWT_FOLDCNT) and when they overflow they trigger an event. I didn't find a way to collect that event. There are no code snippet that explains how to do that or interrupt routines suitable for that.
It seems moreover that using watchpoints from gdb on the addresses of those registers doesn't work. gdb is not able to stop when registers change value. E.g. on DWT_LSUCNT:
(gdb) watch *0xE0001014
Update: I found this project on GitHub explaining how to use DWT, ITM and ETM units. But I didn't check if it works! I will post updates.
Any idea on how to use them?
Thank you!
The code sample you provided has a problem in clearing the enable bit. You should clear the bit using 'AND' not 'OR':
*DWT_CONTROL = *DWT_CONTROL & 0xFFFFFFFE ; // disable the counter by clearing the enable bit
I think if you want to measure accuracy cycles, using debugger is a good choice. the Keil-MDK could accumulate the state register and will not overflow. the result in debugger is the same as the result using DWT.
if you want to measure the other values ie FOLDCNT, using trace in Keil-MDK -> Debug -> Setting -> Trace -> Trace Enable.
With that, while debugging, in the Trace Windows choose trace event, the value of those 8 bits register could be collected and added together by Keil.
It seems a little stupid but I don't know how to collect the event of overflow, I think this event could only be send to ITM, because either the DWT or the ITM is individual component out of the program. if we want to collect the event in customer program, the collect action will must effect the accuracy of the result.
ITM? ETM? CoreSight? DWT?AHB?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With