Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In pandas, why does tz_convert change the timezone used from EST to LMT?

In the script below, Why are tz and tz2 are different?

import pandas
import pytz
tz = pytz.timezone('US/Eastern')
t = pandas.Timestamp('2014-03-03 08:05:39.216809')
tz2 = t.tz_localize(pytz.UTC).tz_convert(tz).tz

In this case, tz displays as:

<DstTzInfo 'US/Eastern' LMT-1 day, 19:04:00 STD>

But tz2 displays as:

<DstTzInfo 'US/Eastern' EST-1 day, 19:00:00 STD>

Shouldn't pandas honor the timezone I pass in to tz_convert? (Is this perhaps a known bug?)

Update:

This is more of a question about pytz it seems. The behavior that still confuses me (but likely has a clear explanation) is why are following different?

tz
<DstTzInfo 'US/Eastern' LMT-1 day, 19:04:00 STD>

tz.localize(t).tzinfo
<DstTzInfo 'US/Eastern' EST-1 day, 19:00:00 STD>
like image 760
D. A. Avatar asked Jun 12 '14 15:06

D. A.


2 Answers

These are NOT the same.

pytz.timezone(...) gives you the most recent timezone! (as of your pytz package date).

Older version of pytz installed

In [47]: pytz.__version__
Out[47]: '2012j'

In [48]: pytz.timezone('US/Eastern')
Out[48]: <DstTzInfo 'US/Eastern' EST-1 day, 19:00:00 STD>

Latest version installed

In [2]: pytz.__version__
Out[2]: '2014.4'

In [3]: pytz.timezone('US/Eastern')
Out[3]: <DstTzInfo 'US/Eastern' LMT-1 day, 19:04:00 STD>

Pandas handles this correctly, you can do it with a datetime directly like this

pytz.timezone('US/Eastern').localize(datetime.datetime(2012,1,1))

The timezone definition have recently changed to use LMT (local-mean-time). This doesn't matter as you when you localize to the dates that you are using they will be in the correct time zone.

So in answer to your question, tz2 is correct as it localizes to a time zone that is correct for its date, while tz is 'correct' for the current date.

like image 153
Jeff Avatar answered Nov 07 '22 10:11

Jeff


This is pytz's workaround for the fact that datetime.tzinfo, the abstract class that represents the interface between time zone objects and datetime.datetime objects, is expected to be able to discover the offset of a time zone given only the local time, which is not in general possible because some local times are ambiguous on account of offset changes caused by Daylight Saving Time or other governmental action.

The purpose of localize is to take a local time and an additional is_dst parameter and return an unambiguous datetime.datetime with a time zone object that is tailored to give the correct offset for that time. But a pytz time zone that isn't the result of localizing a time knows it can't always give the correct offset, so it doesn't try very hard-- instead, it just defaults to the first entry in the Zoneinfo database for that time. In the case of US/Eastern, that's just local mean time in New York (hence those crazy four minutes). You can get that same offset by localizing an early enough time:

In [28]: pytz.timezone('US/Eastern').localize(datetime.datetime(1901, 1, 1))
Out[28]: datetime.datetime(1901, 1, 1, 0, 0, tzinfo=<DstTzInfo 'US/Eastern' LMT-1 day, 19:04:00 STD>)

I do not know why pytz version 2012j exhibits different behavior, but I would guess that either the historical entries were added to Zoneinfo sometime in the past two years, or that at some point in that period unlocalized pytz time zones switched from a (sometimes subtly wrong) guess at the correct offset in Zoneinfo to the (obviously wrong) oldest offset in Zoneinfo.

Once PEP 431 is finished, datetime.tzinfo methods will take is_dst parameters where appropriate and pytz will be able to implement time zones that do the right thing without having the user jump though localize and normalize hoops.

like image 40
Isaac Schwabacher Avatar answered Nov 07 '22 09:11

Isaac Schwabacher