I need to do precision timing to the 1 us level to time a change in duty cycle of a pwm wave.
Background
I am using a Gumstix Over Water COM (https://www.gumstix.com/store/app.php/products/265/) that has a single core ARM Cortex-A8 processor running at 499.92 BogoMIPS (the Gumstix page claims up to 1Ghz with 800Mhz recommended) according to /proc/cpuinfo. The OS is an Angstrom Image version of Linux based of kernel version 2.6.34 and it is stock on the Gumstix Water COM.
The Problem
I have done a fair amount of reading about precise timing in Linux (and have tried most of it) and the consensus seems to be that using clock_gettime() and referencing CLOCK_MONOTONIC is the best way to do it. (I would have liked to use the RDTSC register for timing since I have one core with minimal power saving abilities but this is not an Intel processor.) So here is the odd part, while clock_getres() returns 1, suggesting resolution at 1 ns, actual timing tests suggest a minimum resolution of 30517ns or (it can't be coincidence) exactly the time between a 32.768KHz clock ticks. Here's what I mean:
// Stackoverflow example
#include <stdio.h>
#include <time.h>
#define SEC2NANOSEC 1000000000
int main( int argc, const char* argv[] )
{
// //////////////// Min resolution test //////////////////////
struct timespec resStart, resEnd, ts;
ts.tv_sec = 0; // s
ts.tv_nsec = 1; // ns
int iters = 100;
double resTime,sum = 0;
int i;
for (i = 0; i<iters; i++)
{
clock_gettime(CLOCK_MONOTONIC, &resStart); // start timer
// clock_nanosleep(CLOCK_MONOTONIC, 0, &ts, &ts);
clock_gettime(CLOCK_MONOTONIC, &resEnd); // end timer
resTime = ((double)resEnd.tv_sec*SEC2NANOSEC + (double)resEnd.tv_nsec
- ((double)resStart.tv_sec*SEC2NANOSEC + (double)resStart.tv_nsec);
sum = sum + resTime;
printf("resTime = %f\n",resTime);
}
printf("Average = %f\n",sum/(double)iters);
}
(Don't fret over the double casting, tv_sec in a time_t and tv_nsec is a long.)
Compile with:
gcc soExample.c -o runSOExample -lrt
Run with:
./runSOExample
With the nanosleep commented out as shown, the result is either 0ns or 30517ns with the majority being 0ns. This leads me to believe that CLOCK_MONOTONIC is updated at 32.768kHz and most of the time the clock has not been updated before the second clock_gettime() call is made and in cases where the result is 30517ns the clock has been updated between calls.
When I do the same thing on my development computer (AMD FX(tm)-6100 Six-Core Processor running at 1.4 GHz) the minimum delay is a more constant 149-151ns with no zeros.
So, let's compare those results to the CPU speeds. For the Gumstix, that 30517ns (32.768kHz) equates to 15298 cycles of the 499.93MHz cpu. For my dev computer that 150ns equates to 210 cycles of the 1.4Ghz CPU.
With the clock_nanosleep() call uncommented the average results are these: Gumstix: Avg value = 213623 and the result varies, up and down, by multiples of that min resolution of 30517ns Dev computer: 57710-68065 ns with no clear trend. In the case of the dev computer I expect the resolution to actually be at the 1 ns level and the measured ~150ns truly is the time elapsed between the two clock_gettime() calls.
So, my question's are these: What determines that minimum resolution? Why is the resolution of the dev computer 30000X better than the Gumstix when the processor is only running ~2.6X faster? Is there a way to change how often CLOCK_MONOTONIC is updated and where? In the kernel?
Thanks! If you need more info or clarification just ask.
As I understand, the difference between two environments(Gumstix and your Dev-computer) might be the underlying timer h/w they are using.
Commented nanosleep() case:
You are using clock_gettime() twice. To give you a rough idea of what this clock_gettime() will ultimately get mapped to(in kernel):
clock_gettime -->clock_get() -->posix_ktime_get_ts -->ktime_get_ts() -->timekeeping_get_ns() -->clock->read()
clock->read() basically reads the value of the counter provided by underlying timer driver and corresponding h/w. A simple difference with stored value of the counter in the past and current counter value and then nanoseconds conversion mathematics will yield you the nanoseconds elapsed and will update the time-keeping data structures in kernel.
For example , if you have a HPET timer which gives you a 10 MHz clock, the h/w counter will get updated at 100 ns time interval.
Lets say, on first clock->read(), you get a counter value of X.
Linux Time-keeping data structures will read this value of X, get the difference 'D'compared to some old stored counter value.Do some counter-difference 'D' to nanoseconds 'n' conversion mathematics, update the data-structure by 'n' Yield this new time value to the user space.
When second clock->read() is issued, it will again read the counter and update the time. Now, for a HPET timer, this counter is getting updated every 100ns and hence , you will see this difference being reported to the user-space.
Now, Let's replace this HPET timer with a slow 32.768 KHz clock. Now , clock->read()'s counter will updated only after 30517 ns seconds, so, if you second call to clock_gettime() is before this period, you will get 0(which is majority of the cases) and in some cases, your second function call will be placed after counter has incremented by 1, i.e 30517 ns has elapsed. Hence , the value of 30517 ns sometimes.
Uncommented Nanosleep() case: Let's trace the clock_nanosleep() for monotonic clocks:
clock_nanosleep() -->nsleep --> common_nsleep() -->hrtimer_nanosleep() -->do_nanosleep()
do_nanosleep() will simply put the current task in INTERRUPTIBLE state, will wait for the timer to expire(which is 1 ns) and then set the current task in RUNNING state again. You see, there are lot of factors involved now, mainly when your kernel thread (and hence the user space process) will be scheduled again. Depending on your OS, you will always face some latency when your doing a context-switch and this is what we observe with the average values.
Now Your questions:
What determines that minimum resolution?
I think the resolution/precision of your system will depend on the underlying timer hardware being used(assuming your OS is able to provide that precision to the user space process).
*Why is the resolution of the dev computer 30000X better than the Gumstix when the processor is only running ~2.6X faster?*
Sorry, I missed you here. How it is 30000x faster? To me , it looks like something 200x faster(30714 ns/ 150 ns ~ 200X ? ) .But anyway, as I understand, CPU speed may or may not have to do with the timer resolution/precision. So, this assumption may be right in some architectures(when you are using TSC H/W), though, might fail in others(using HPET, PIT etc).
Is there a way to change how often CLOCK_MONOTONIC is updated and where? In the kernel?
you can always look into the kernel code for details(that's how i looked into it). In linux kernel code , look for these source files and Documentation:
I do not have gumstix on hand, but it looks like your clocksource is slow. run:
$ dmesg | grep clocksource
If you get back
[ 0.560455] Switching to clocksource 32k_counter
This might explain why your clock is so slow.
In the recent kernels there is a directory /sys/devices/system/clocksource/clocksource0
with two files: available_clocksource
and current_clocksource
. If you have this directory, try switching to a different source by echo'ing its name into second file.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With