Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Accurate timestamping in Python logging

I've been building an error logging app recently and was after a way of accurately timestamping the incoming data. When I say accurately I mean each timestamp should be accurate relative to each other (no need to sync to an atomic clock or anything like that).

I've been using datetime.now() as a first stab, but this isn't perfect:

>>> for i in range(0,1000):
...     datetime.datetime.now()
...
datetime.datetime(2008, 10, 1, 13, 17, 27, 562000)
datetime.datetime(2008, 10, 1, 13, 17, 27, 562000)
datetime.datetime(2008, 10, 1, 13, 17, 27, 562000)
datetime.datetime(2008, 10, 1, 13, 17, 27, 562000)
datetime.datetime(2008, 10, 1, 13, 17, 27, 578000)
datetime.datetime(2008, 10, 1, 13, 17, 27, 578000)
datetime.datetime(2008, 10, 1, 13, 17, 27, 578000)
datetime.datetime(2008, 10, 1, 13, 17, 27, 578000)
datetime.datetime(2008, 10, 1, 13, 17, 27, 578000)
datetime.datetime(2008, 10, 1, 13, 17, 27, 609000)
datetime.datetime(2008, 10, 1, 13, 17, 27, 609000)
datetime.datetime(2008, 10, 1, 13, 17, 27, 609000)
etc.

The changes between clocks for the first second of samples looks like this:

uSecs    difference
562000  
578000  16000
609000  31000
625000  16000
640000  15000
656000  16000
687000  31000
703000  16000
718000  15000
750000  32000
765000  15000
781000  16000
796000  15000
828000  32000
843000  15000
859000  16000
890000  31000
906000  16000
921000  15000
937000  16000
968000  31000
984000  16000

So it looks like the timer data is only updated every ~15-32ms on my machine. The problem comes when we come to analyse the data because sorting by something other than the timestamp and then sorting by timestamp again can leave the data in the wrong order (chronologically). It would be nice to have the time stamps accurate to the point that any call to the time stamp generator gives a unique timestamp.

I had been considering some methods involving using a time.clock() call added to a starting datetime, but would appreciate a solution that would work accurately across threads on the same machine. Any suggestions would be very gratefully received.

like image 764
Jon Cage Avatar asked Oct 01 '08 12:10

Jon Cage


People also ask

What is timestamp value in Python?

What is timestamp in Python? Timestamp is the date and time of occurrence of an event. In Python we can get the timestamp of an event to an accuracy of milliseconds. The timestamp format in Python returns the time elapsed from the epoch time which is set to 00:00:00 UTC for 1 January 1970.

How do you timestamp A string in Python?

The strftime() method returns a string representing date and time using date, time or datetime object.


2 Answers

time.clock() only measures wallclock time on Windows. On other systems, time.clock() actually measures CPU-time. On those systems time.time() is more suitable for wallclock time, and it has as high a resolution as Python can manage -- which is as high as the OS can manage; usually using gettimeofday(3) (microsecond resolution) or ftime(3) (millisecond resolution.) Other OS restrictions actually make the real resolution a lot higher than that. datetime.datetime.now() uses time.time(), so time.time() directly won't be better.

For the record, if I use datetime.datetime.now() in a loop, I see about a 1/10000 second resolution. From looking at your data, you have much, much coarser resolution than that. I'm not sure if there's anything Python as such can do, although you may be able to convince the OS to do better through other means.

I seem to recall that on Windows, time.clock() is actually (slightly) more accurate than time.time(), but it measures wallclock since the first call to time.clock(), so you have to remember to 'initialize' it first.

like image 55
Thomas Wouters Avatar answered Oct 24 '22 01:10

Thomas Wouters


You're unlikely to get sufficiently fine-grained control that you can completely eliminate the possibility of duplicate timestamps - you'd need resolution smaller than the time it takes to generate a datetime object. There are a couple of other approaches you might take to deal with it:

  1. Deal with it. Leave your timestamps non-unique as they are, but rely on python's sort being stable to deal with reordering problems. Sorting on timestamp first, then something else will retain the timestamp ordering - you just have to be careful to always start from the timestamp ordered list every time, rather than doing multiple sorts on the same list.

  2. Append your own value to enforce uniqueness. Eg. include an incrementing integer value as part of the key, or append such a value only if timestamps are different. Eg.

The following will guarantee unique timestamp values:

    class TimeStamper(object):
        def __init__(self):
            self.lock = threading.Lock()
            self.prev = None
            self.count = 0

         def getTimestamp(self):
             with self.lock:
                 ts = str(datetime.now())
                 if ts == self.prev:
                     ts +='.%04d' % self.count
                     self.count += 1
                 else:
                     self.prev = ts
                     self.count = 1
             return ts

For multiple processes (rather than threads), it gets a bit trickier though.

like image 26
Brian Avatar answered Oct 24 '22 03:10

Brian