Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract historic leap seconds from tzdata

Tags:

Is there a way to extract the moment of historic leap seconds from the time-zone database that is distributed on most linux distributions? I am looking for a solution in python, but anything that works on the command line would be fine too.

My use case is to convert between gps-time (which is basically the number of seconds since the first GPS-satellite was switched on in 1980) and UTC or local time. UTC is adjusted for leap-seconds every now and then, while gps-time increases linearly. This is equivalent to converting between UTC and TAI. TAI also ignores leap-seconds, so TAI and gps-time should always evolve with the same offset. At work, we use gps-time as the time standard for synchronizing astronomical observations around the world.

I have working functions that convert between gps-time and UTC, but I had to hard-code a table of leap seconds, which I get here (the file tzdata2013xx.tar.gz contains a file named leapseconds). I have to update this file by hand every few years when a new leapsecond is announced. I would prefer to get this information from the standard tzdata, which is automatically updated via system updates several times a year.

I am pretty sure the information is hidden in some binary files somewhere in /usr/share/zoneinfo/. I have been able to extract some of it using struct.unpack (man tzfile gives some info about the format), but I never got it working completely. Are there any standard packages that can access this information? I know about pytz, which seems to get the standard DST information from the same database, but it does not give access to leap seconds. I also found tai64n, but looking at its source code, it just contains a hard-coded table.

EDIT

Inspired by steveha's answer and some code in pytz/tzfile.py, I finally got a working solution (tested on py2.5 and py2.7):

from struct import unpack, calcsize
from datetime import datetime

def print_leap(tzfile = '/usr/share/zoneinfo/right/UTC'):
    with open(tzfile, 'rb') as f:
        # read header
        fmt = '>4s c 15x 6l'
        (magic, format, ttisgmtcnt, ttisstdcnt,leapcnt, timecnt,
            typecnt, charcnt) =  unpack(fmt, f.read(calcsize(fmt)))
        assert magic == 'TZif'.encode('US-ASCII'), 'Not a timezone file'
        print 'Found %i leapseconds:' % leapcnt

        # skip over some uninteresting data
        fmt = '>%(timecnt)dl %(timecnt)dB %(ttinfo)s %(charcnt)ds' % dict(
            timecnt=timecnt, ttinfo='lBB'*typecnt, charcnt=charcnt)
        f.read(calcsize(fmt))

        #read leap-seconds
        fmt = '>2l'
        for i in xrange(leapcnt):
            tleap, nleap = unpack(fmt, f.read(calcsize(fmt)))
            print datetime.utcfromtimestamp(tleap-nleap+1)

with result

In [2]: print_leap()
Found 25 leapseconds:
1972-07-01 00:00:00
1973-01-01 00:00:00
1974-01-01 00:00:00
...
2006-01-01 00:00:00
2009-01-01 00:00:00
2012-07-01 00:00:00

While this does solve my question, I will probably not go for this solution. Instead, I will include leap-seconds.list with my code, as suggested by Matt Johnson. This seems to be the authoritative list used as a source for tzdata, and is probably updated by NIST twice a year. This means I will have to do the update by hand, but this file is straightforward to parse and includes an expiration date (which tzdata seems to be missing).

like image 438
Bas Swinckels Avatar asked Oct 12 '13 09:10

Bas Swinckels


1 Answers

I just did man 5 tzfile and computed an offset that would find the leap seconds info, then read the leap seconds info.

You can uncomment the "DEBUG:" print statements to see more of what it finds in the file.

EDIT: program updated to now be correct. It now uses the file /usr/share/zoneinfo/right/UTC and it now finds leap-seconds to print.

The original program wasn't skipping the timezeone abbreviation characters, which are documented in the man page but sort of hidden ("...and tt_abbrind serves as an index into the array of timezone abbreviation characters that follow the ttinfo structure(s) in the file.").

import datetime
import struct

TZFILE_MAGIC = 'TZif'.encode('US-ASCII')

def leap_seconds(f):
    """
    Return a list of tuples of this format: (timestamp, number_of_seconds)
        timestamp: a 32-bit timestamp, seconds since the UNIX epoch
        number_of_seconds: how many leap-seconds occur at timestamp

    """
    fmt = ">4s c 15x 6l"
    size = struct.calcsize(fmt)
    (tzfile_magic, tzfile_format, ttisgmtcnt, ttisstdcnt, leapcnt, timecnt,
        typecnt, charcnt) =  struct.unpack(fmt, f.read(size))
    #print("DEBUG: tzfile_magic: {} tzfile_format: {} ttisgmtcnt: {} ttisstdcnt: {} leapcnt: {} timecnt: {} typecnt: {} charcnt: {}".format(tzfile_magic, tzfile_format, ttisgmtcnt, ttisstdcnt, leapcnt, timecnt, typecnt, charcnt))

    # Make sure it is a tzfile(5) file
    assert tzfile_magic == TZFILE_MAGIC, (
            "Not a tzfile; file magic was: '{}'".format(tzfile_magic))

    # comments below show struct codes such as "l" for 32-bit long integer
    offset = (timecnt*4  # transition times, each "l"
        + timecnt*1  # indices tying transition time to ttinfo values, each "B"
        + typecnt*6  # ttinfo structs, each stored as "lBB"
        + charcnt*1)  # timezone abbreviation chars, each "c"

    f.seek(offset, 1) # seek offset bytes from current position

    fmt = '>{}l'.format(leapcnt*2)
    #print("DEBUG: leapcnt: {}  fmt: '{}'".format(leapcnt, fmt))
    size = struct.calcsize(fmt)
    data = struct.unpack(fmt, f.read(size))

    lst = [(data[i], data[i+1]) for i in range(0, len(data), 2)]
    assert all(lst[i][0] < lst[i+1][0] for i in range(len(lst)-1))
    assert all(lst[i][1] == lst[i+1][1]-1 for i in range(len(lst)-1))

    return lst

def print_leaps(leap_lst):
    # leap_lst is tuples: (timestamp, num_leap_seconds)
    for ts, num_secs in leap_lst:
        print(datetime.datetime.utcfromtimestamp(ts - num_secs+1))

if __name__ == '__main__':
    import os
    zoneinfo_fname = '/usr/share/zoneinfo/right/UTC'
    with open(zoneinfo_fname, 'rb') as f:
        leap_lst = leap_seconds(f)
    print_leaps(leap_lst)
like image 194
steveha Avatar answered Sep 19 '22 18:09

steveha