I have data with a time-stamp in UTC. I'd like to convert the timezone of this timestamp to 'US/Pacific' and add it as a hierarchical index to a pandas DataFrame. I've been able to convert the timestamp as an Index, but it loses the timezone formatting when I try to add it back into the DataFrame, either as a column or as an index.
>>> import pandas as pd >>> dat = pd.DataFrame({'label':['a', 'a', 'a', 'b', 'b', 'b'], 'datetime':['2011-07-19 07:00:00', '2011-07-19 08:00:00', '2011-07-19 09:00:00', '2011-07-19 07:00:00', '2011-07-19 08:00:00', '2011-07-19 09:00:00'], 'value':range(6)}) >>> dat.dtypes #datetime object #label object #value int64 #dtype: object
Now if I try to convert the Series directly I run into an error.
>>> times = pd.to_datetime(dat['datetime']) >>> times.tz_localize('UTC') #Traceback (most recent call last): # File "<stdin>", line 1, in <module> # File "/Users/erikshilts/workspace/schedule-detection/python/pysched/env/lib/python2.7/site-packages/pandas/core/series.py", line 3170, in tz_localize # raise Exception('Cannot tz-localize non-time series') #Exception: Cannot tz-localize non-time series
If I convert it to an Index then I can manipulate it as a timeseries. Notice that the index now has the Pacific timezone.
>>> times_index = pd.Index(times) >>> times_index_pacific = times_index.tz_localize('UTC').tz_convert('US/Pacific') >>> times_index_pacific #<class 'pandas.tseries.index.DatetimeIndex'> #[2011-07-19 00:00:00, ..., 2011-07-19 02:00:00] #Length: 6, Freq: None, Timezone: US/Pacific
However, now I run into problems adding the index back to the dataframe as it loses its timezone formatting:
>>> dat_index = dat.set_index([dat['label'], times_index_pacific]) >>> dat_index # datetime label value #label #a 2011-07-19 07:00:00 2011-07-19 07:00:00 a 0 # 2011-07-19 08:00:00 2011-07-19 08:00:00 a 1 # 2011-07-19 09:00:00 2011-07-19 09:00:00 a 2 #b 2011-07-19 07:00:00 2011-07-19 07:00:00 b 3 # 2011-07-19 08:00:00 2011-07-19 08:00:00 b 4 # 2011-07-19 09:00:00 2011-07-19 09:00:00 b 5
You'll notice the index is back on the UTC timezone instead of the converted Pacific timezone.
How can I change the timezone and add it as an index to a DataFrame?
To convert the index of a DataFrame to DatetimeIndex , use Pandas' to_datetime(~) method.
If you set it as the index, it's automatically converted to an Index:
In [11]: dat.index = pd.to_datetime(dat.pop('datetime'), utc=True) In [12]: dat Out[12]: label value datetime 2011-07-19 07:00:00 a 0 2011-07-19 08:00:00 a 1 2011-07-19 09:00:00 a 2 2011-07-19 07:00:00 b 3 2011-07-19 08:00:00 b 4 2011-07-19 09:00:00 b 5
Then do the tz_localize
:
In [12]: dat.index = dat.index.tz_localize('UTC').tz_convert('US/Pacific') In [13]: dat Out[13]: label value datetime 2011-07-19 00:00:00-07:00 a 0 2011-07-19 01:00:00-07:00 a 1 2011-07-19 02:00:00-07:00 a 2 2011-07-19 00:00:00-07:00 b 3 2011-07-19 01:00:00-07:00 b 4 2011-07-19 02:00:00-07:00 b 5
And then you can append the label column to the index:
Hmmm this is definitely a bug!
In [14]: dat.set_index('label', append=True).swaplevel(0, 1) Out[14]: value label datetime a 2011-07-19 07:00:00 0 2011-07-19 08:00:00 1 2011-07-19 09:00:00 2 b 2011-07-19 07:00:00 3 2011-07-19 08:00:00 4 2011-07-19 09:00:00 5
A hacky workaround is to convert the (datetime) level directly (when it's already a MultiIndex):
In [15]: dat.index.levels[1] = dat.index.get_level_values(1).tz_localize('UTC').tz_convert('US/Pacific') In [16]: dat1 Out[16]: value label datetime a 2011-07-19 00:00:00-07:00 0 2011-07-19 01:00:00-07:00 1 2011-07-19 02:00:00-07:00 2 b 2011-07-19 00:00:00-07:00 3 2011-07-19 01:00:00-07:00 4 2011-07-19 02:00:00-07:00 5
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With