Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fast way to transform datetime strings with timezones into UNIX timestamps in C++

I want to convert a huge file containing datetime strings to seconds since UNIX epoch (January 1, 1970) in C++. I need the computation to be very fast because I need to handle large amount of datetimes.

So far I've tried two options. The first was to use mktime, defined in time.h. The second option I tried was Howard Hinnant's date library with time zone extension.

Here is the code I used to compare the performance between mktime and Howard Hinnant's tz:

for( int i=0; i<RUNS; i++){
    genrandomdate(&time_str);

    time_t t = mktime(&time_str);

}

auto tz = current_zone()
for( int i=0; i<RUNS; i++){

    genrandomdate(&time_str);
    auto ymd = year{time_str.tm_year+1900}/(time_str.tm_mon+1)/time_str.tm_mday;
    auto tcurr = make_zoned(tz, local_days{ymd} + 
            seconds{time_str.tm_hour*3600 + time_str.tm_min*60 + time_str.tm_sec}, choose::earliest);
    auto tbase = make_zoned("UTC", local_days{January/1/1970});
    auto dp = tcurr.get_sys_time() - tbase.get_sys_time() + 0s;

}

The results of the comparison:

time for mktime : 0.000142s
time for tz : 0.018748s

The performance of tz is not good compared to mktime. I want something faster than mktime because mktime is also very slow when used repeatedly for large number iterations. Java Calendar provides a very fast way to do this, but I don't know any C++ alternatives for this when time zones are also in play.

Note: Howard Hinnant's date works very fast (even surpassing Java) when used without time zones. But that is not enough for my requirements.

like image 644
charitha22 Avatar asked May 17 '19 17:05

charitha22


People also ask

How do I manually convert a date to a timestamp in Unix?

To easily convert UNIX timestamp to date in the . csv file, do the following: 1. =R2/86400000+DATE(1970,1,1), press Enter key.

Is Unix timestamp same for all timezones?

The UNIX timestamp is the number of seconds (or milliseconds) elapsed since an absolute point in time, midnight of Jan 1 1970 in UTC time. (UTC is Greenwich Mean Time without Daylight Savings time adjustments.) Regardless of your time zone, the UNIX timestamp represents a moment that is the same everywhere.

Are Unix timestamps always UTC?

Unix timestamps are always based on UTC (otherwise known as GMT). It is illogical to think of a Unix timestamp as being in any particular time zone. Unix timestamps do not account for leap seconds.

How do I convert timestamp to time in Unix?

To convert a date to a Unix timestamp:Create a Date object using the Date() constructor. Get a timestamp in milliseconds using the getTime() method. Convert the result to seconds by dividing by 1000 .


1 Answers

There are some things you can do to optimize your use of Howard Hinnant's date library:

auto tbase = make_zoned("UTC", local_days{January/1/1970});

The lookup of a timezone (even "UTC") involves doing a binary search of the database for a timezone with that name. It is quicker to do a lookup once, and reuse the result:

// outside of loop:
auto utc_tz = locate_zone("UTC");

// inside of loop:
auto tbase = make_zoned(utc_tz, local_days{January/1/1970});

Moreover, I note that tbase is loop-independent, so the whole thing could be moved outside of the loop:

// outside of loop:
auto tbase = make_zoned("UTC", local_days{January/1/1970});

Here's a further minor optimization to be made. Change:

auto dp = tcurr.get_sys_time() - tbase.get_sys_time() + 0s;

To:

auto dp = tcurr.get_sys_time().time_since_epoch();

This gets rid of the need for tbase altogether. tcurr.get_sys_time().time_since_epoch() is the duration of time since 1970-01-01 00:00:00 UTC, in seconds. The precision of seconds is just for this example, since the input has seconds precision.

Style nit: Try to avoid putting conversion factors in your code. This means changing:

auto tcurr = make_zoned(tz, local_days{ymd} + 
        seconds{time_str.tm_hour*3600 + time_str.tm_min*60 + time_str.tm_sec}, choose::earliest);

to:

auto tcurr = make_zoned(tz, local_days{ymd} + hours{time_str.tm_hour} + 
                        minutes{time_str.tm_min} + seconds{time_str.tm_sec},
                        choose::earliest);

Is there a way to avoid this binary search if this time zone is also fixed. I mean can we get the time zone offset and DST offset and manually adjust the time point.

If you are not on Windows, try compiling with -DUSE_OS_TZDB=1. This uses a compiled-form of the database which can have higher performance.

There is a way to get the offset and apply it manually (https://howardhinnant.github.io/date/tz.html#local_info), however unless you know that your offset doesn't change with the value of the time_point, you're going to end up reinventing the logic under the hood of make_zoned.

But if you are confident that your UTC offset is constant, here's how you can do it:

auto tz = current_zone();
// Use a sample time_point to get the utc_offset:
auto info = tz->get_info(
    local_days{year{time_str.tm_year+1900}/(time_str.tm_mon+1)/time_str.tm_mday}
      + hours{time_str.tm_hour} + minutes{time_str.tm_min}
      + seconds{time_str.tm_sec});
seconds utc_offset = info.first.offset;
for( int i=0; i<RUNS; i++){

    genrandomdate(&time_str);
    // Apply the offset manually:
    auto ymd = year{time_str.tm_year+1900}/(time_str.tm_mon+1)/time_str.tm_mday;
    auto tp = sys_days{ymd} + hours{time_str.tm_hour} +
              minutes{time_str.tm_min} + seconds{time_str.tm_sec} - utc_offset;
    auto dp = tp.time_since_epoch();
}

Update -- My own timing tests

I'm running macOS 10.14.4 with Xcode 10.2.1. I've created a relatively quiet machine: Time machine backup is not running. Mail is not running. iTunes is not running.

I have the following application which implements the desire conversion using several different techniques, depending upon preprocessor settings:

#include "date/tz.h"
#include <cassert>
#include <iostream>
#include <vector>

constexpr int RUNS = 1'000'000;
using namespace date;
using namespace std;
using namespace std::chrono;

vector<tm>
gendata()
{
    vector<tm> v;
    v.reserve(RUNS);
    auto tz = current_zone();
    auto tp = floor<seconds>(system_clock::now());
    for (auto i = 0; i < RUNS; ++i, tp += 1s)
    {
        zoned_seconds zt{tz, tp};
        auto lt = zt.get_local_time();
        auto d = floor<days>(lt);
        year_month_day ymd{d};
        auto s = lt - d;
        auto h = floor<hours>(s);
        s -= h;
        auto m = floor<minutes>(s);
        s -= m;
        tm x{};
        x.tm_year = int{ymd.year()} - 1900;
        x.tm_mon = unsigned{ymd.month()} - 1;
        x.tm_mday = unsigned{ymd.day()};
        x.tm_hour = h.count();
        x.tm_min = m.count();
        x.tm_sec = s.count();
        x.tm_isdst = -1;
        v.push_back(x);
    }
    return v;
}


int
main()
{

    auto v = gendata();
    vector<time_t> vr;
    vr.reserve(v.size());
    auto tz = current_zone();  // Using date
    sys_seconds begin;         // Using date, optimized
    sys_seconds end;           // Using date, optimized
    seconds offset{};          // Using date, optimized

    auto t0 = steady_clock::now();
    for(auto const& time_str : v)
    {
#if 0  // Using mktime
        auto t = mktime(const_cast<tm*>(&time_str));
        vr.push_back(t);
#elif 1  // Using date, easy
        auto ymd = year{time_str.tm_year+1900}/(time_str.tm_mon+1)/time_str.tm_mday;
        auto tp = local_days{ymd} + hours{time_str.tm_hour} +
                  minutes{time_str.tm_min} + seconds{time_str.tm_sec};
        zoned_seconds zt{tz, tp};
        vr.push_back(zt.get_sys_time().time_since_epoch().count());
#elif 0  // Using date, optimized
        auto ymd = year{time_str.tm_year+1900}/(time_str.tm_mon+1)/time_str.tm_mday;
        auto tp = local_days{ymd} + hours{time_str.tm_hour} +
                  minutes{time_str.tm_min} + seconds{time_str.tm_sec};
        sys_seconds zt{(tp - offset).time_since_epoch()};
        if (!(begin <= zt && zt < end))
        {
            auto info = tz->get_info(tp);
            offset = info.first.offset;
            begin = info.first.begin;
            end = info.first.end;
            zt = sys_seconds{(tp - offset).time_since_epoch()};
        }
        vr.push_back(zt.time_since_epoch().count());
#endif
    }
    auto t1 = steady_clock::now();

    cout << (t1-t0)/v.size() << " per conversion\n";
    auto i = vr.begin();
    for(auto const& time_str : v)
    {
        auto t = mktime(const_cast<tm*>(&time_str));
        assert(t == *i);
        ++i;
    }
}

Each solution is timed, and then checked for correctness against a baseline solution. Each solution converts 1,000,000 timestamps, all relatively close together temporally, and outputs the average time per conversion.

I present four solutions, and their timings in my environment:

1. Use mktime.

Output:

3849ns per conversion

2. Use tz.h in the easiest way with USE_OS_TZDB=0

Output:

3976ns per conversion

This is slightly slower than the mktime solution.

3. Use tz.h in the easiest way with USE_OS_TZDB=1

Output:

55ns per conversion

This is much faster than the above two solutions. However this solution is not available on Windows (at this time), and on macOS does not support the leap seconds part of the library (not used in this test). Both of these limitations are caused by how the OS ships their time zone databases.

4. Use tz.h in an optimized way, taking advantage of the a-priori knowledge of temporally grouped time stamps. If the assumption is false, performance suffers, but correctness is not compromised.

Output:

15ns per conversion

This result is roughly independent of the USE_OS_TZDB setting. But the performance relies on the fact that the input data does not change UTC offsets very often. This solution is also careless with local time points that are ambiguous or non-existent. Such local time points don't have a unique mapping to UTC. Solutions 2 and 3 throw exceptions if such local time points are encountered.

Run time error with USE_OS_TZDB

The OP got this stack dump when running on Ubuntu. This crash happens on first access to the time zone database. The crash is caused by empty stub functions provided by the OS for the pthread library. The fix is to explicitly link to the pthreads library (include -lpthread on the command line).

==20645== Process terminating with default action of signal 6 (SIGABRT)
==20645==    at 0x5413428: raise (raise.c:54)
==20645==    by 0x5415029: abort (abort.c:89)
==20645==    by 0x4EC68F6: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25)
==20645==    by 0x4ECCA45: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25)
==20645==    by 0x4ECCA80: std::terminate() (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25)
==20645==    by 0x4ECCCB3: __cxa_throw (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25)
==20645==    by 0x4EC89B8: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25)
==20645==    by 0x406AF9: void std::call_once<date::time_zone::init() const::{lambda()#1}>(std::once_flag&, date::time_zone::init() const::{lambda()#1}&&) (mutex:698)
==20645==    by 0x40486C: date::time_zone::init() const (tz.cpp:2114)
==20645==    by 0x404C70: date::time_zone::get_info_impl(std::chrono::time_point<date::local_t, std::chrono::duration<long, std::ratio<1l, 1l> > >) const (tz.cpp:2149)
==20645==    by 0x418E5C: date::local_info date::time_zone::get_info<std::chrono::duration<long, std::ratio<1l, 1l> > >(std::chrono::time_point<date::local_t, std::chrono::duration<long, std::ratio<1l, 1l> > >) const (tz.h:904)
==20645==    by 0x418CB2: std::chrono::time_point<std::chrono::_V2::system_clock, std::common_type<std::chrono::duration<long, std::ratio<1l, 1l> >, std::chrono::duration<long, std::ratio<1l, 1l> > >::type> date::time_zone::to_sys_impl<std::chrono::duration<long, std::ratio<1l, 1l> > >(std::chrono::time_point<date::local_t, std::chrono::duration<long, std::ratio<1l, 1l> > >, date::choose, std::integral_constant<bool, false>) const (tz.h:947)
==20645== 
like image 193
Howard Hinnant Avatar answered Oct 19 '22 16:10

Howard Hinnant