In the script below, Why are tz and tz2 are different?
import pandas
import pytz
tz = pytz.timezone('US/Eastern')
t = pandas.Timestamp('2014-03-03 08:05:39.216809')
tz2 = t.tz_localize(pytz.UTC).tz_convert(tz).tz
In this case, tz displays as:
<DstTzInfo 'US/Eastern' LMT-1 day, 19:04:00 STD>
But tz2 displays as:
<DstTzInfo 'US/Eastern' EST-1 day, 19:00:00 STD>
Shouldn't pandas honor the timezone I pass in to tz_convert? (Is this perhaps a known bug?)
Update:
This is more of a question about pytz it seems. The behavior that still confuses me (but likely has a clear explanation) is why are following different?
tz
<DstTzInfo 'US/Eastern' LMT-1 day, 19:04:00 STD>
tz.localize(t).tzinfo
<DstTzInfo 'US/Eastern' EST-1 day, 19:00:00 STD>
These are NOT the same.
pytz.timezone(...)
gives you the most recent timezone! (as of your pytz package date).
Older version of pytz installed
In [47]: pytz.__version__
Out[47]: '2012j'
In [48]: pytz.timezone('US/Eastern')
Out[48]: <DstTzInfo 'US/Eastern' EST-1 day, 19:00:00 STD>
Latest version installed
In [2]: pytz.__version__
Out[2]: '2014.4'
In [3]: pytz.timezone('US/Eastern')
Out[3]: <DstTzInfo 'US/Eastern' LMT-1 day, 19:04:00 STD>
Pandas handles this correctly, you can do it with a datetime directly like this
pytz.timezone('US/Eastern').localize(datetime.datetime(2012,1,1))
The timezone definition have recently changed to use LMT (local-mean-time). This doesn't matter as you when you localize to the dates that you are using they will be in the correct time zone.
So in answer to your question, tz2
is correct as it localizes to a time zone that is correct for its date, while tz
is 'correct' for the current date.
This is pytz
's workaround for the fact that datetime.tzinfo
, the abstract class that represents the interface between time zone objects and datetime.datetime
objects, is expected to be able to discover the offset of a time zone given only the local time, which is not in general possible because some local times are ambiguous on account of offset changes caused by Daylight Saving Time or other governmental action.
The purpose of localize
is to take a local time and an additional is_dst
parameter and return an unambiguous datetime.datetime
with a time zone object that is tailored to give the correct offset for that time. But a pytz
time zone that isn't the result of localizing a time knows it can't always give the correct offset, so it doesn't try very hard-- instead, it just defaults to the first entry in the Zoneinfo database for that time. In the case of US/Eastern
, that's just local mean time in New York (hence those crazy four minutes). You can get that same offset by localizing an early enough time:
In [28]: pytz.timezone('US/Eastern').localize(datetime.datetime(1901, 1, 1))
Out[28]: datetime.datetime(1901, 1, 1, 0, 0, tzinfo=<DstTzInfo 'US/Eastern' LMT-1 day, 19:04:00 STD>)
I do not know why pytz
version 2012j exhibits different behavior, but I would guess that either the historical entries were added to Zoneinfo sometime in the past two years, or that at some point in that period unlocalized pytz
time zones switched from a (sometimes subtly wrong) guess at the correct offset in Zoneinfo to the (obviously wrong) oldest offset in Zoneinfo.
Once PEP 431 is finished, datetime.tzinfo
methods will take is_dst
parameters where appropriate and pytz
will be able to implement time zones that do the right thing without having the user jump though localize
and normalize
hoops.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With