Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Change timezone of date-time column in pandas and add as hierarchical index

I have data with a time-stamp in UTC. I'd like to convert the timezone of this timestamp to 'US/Pacific' and add it as a hierarchical index to a pandas DataFrame. I've been able to convert the timestamp as an Index, but it loses the timezone formatting when I try to add it back into the DataFrame, either as a column or as an index.

>>> import pandas as pd >>> dat = pd.DataFrame({'label':['a', 'a', 'a', 'b', 'b', 'b'], 'datetime':['2011-07-19 07:00:00', '2011-07-19 08:00:00', '2011-07-19 09:00:00', '2011-07-19 07:00:00', '2011-07-19 08:00:00', '2011-07-19 09:00:00'], 'value':range(6)}) >>> dat.dtypes #datetime    object #label       object #value        int64 #dtype: object 

Now if I try to convert the Series directly I run into an error.

>>> times = pd.to_datetime(dat['datetime']) >>> times.tz_localize('UTC') #Traceback (most recent call last): #  File "<stdin>", line 1, in <module> #  File "/Users/erikshilts/workspace/schedule-detection/python/pysched/env/lib/python2.7/site-packages/pandas/core/series.py", line 3170, in tz_localize #    raise Exception('Cannot tz-localize non-time series') #Exception: Cannot tz-localize non-time series 

If I convert it to an Index then I can manipulate it as a timeseries. Notice that the index now has the Pacific timezone.

>>> times_index = pd.Index(times) >>> times_index_pacific = times_index.tz_localize('UTC').tz_convert('US/Pacific') >>> times_index_pacific #<class 'pandas.tseries.index.DatetimeIndex'> #[2011-07-19 00:00:00, ..., 2011-07-19 02:00:00] #Length: 6, Freq: None, Timezone: US/Pacific 

However, now I run into problems adding the index back to the dataframe as it loses its timezone formatting:

>>> dat_index = dat.set_index([dat['label'], times_index_pacific]) >>> dat_index #                                      datetime label  value #label                                                       #a     2011-07-19 07:00:00  2011-07-19 07:00:00     a      0 #      2011-07-19 08:00:00  2011-07-19 08:00:00     a      1 #      2011-07-19 09:00:00  2011-07-19 09:00:00     a      2 #b     2011-07-19 07:00:00  2011-07-19 07:00:00     b      3 #      2011-07-19 08:00:00  2011-07-19 08:00:00     b      4 #      2011-07-19 09:00:00  2011-07-19 09:00:00     b      5 

You'll notice the index is back on the UTC timezone instead of the converted Pacific timezone.

How can I change the timezone and add it as an index to a DataFrame?

like image 539
Erik Shilts Avatar asked Jun 18 '13 01:06

Erik Shilts


People also ask

How do you change DateTime to index?

To convert the index of a DataFrame to DatetimeIndex , use Pandas' to_datetime(~) method.


1 Answers

If you set it as the index, it's automatically converted to an Index:

In [11]: dat.index = pd.to_datetime(dat.pop('datetime'), utc=True)  In [12]: dat Out[12]:                     label  value datetime 2011-07-19 07:00:00     a      0 2011-07-19 08:00:00     a      1 2011-07-19 09:00:00     a      2 2011-07-19 07:00:00     b      3 2011-07-19 08:00:00     b      4 2011-07-19 09:00:00     b      5 

Then do the tz_localize:

In [12]: dat.index = dat.index.tz_localize('UTC').tz_convert('US/Pacific')  In [13]: dat Out[13]:                           label  value datetime 2011-07-19 00:00:00-07:00     a      0 2011-07-19 01:00:00-07:00     a      1 2011-07-19 02:00:00-07:00     a      2 2011-07-19 00:00:00-07:00     b      3 2011-07-19 01:00:00-07:00     b      4 2011-07-19 02:00:00-07:00     b      5 

And then you can append the label column to the index:

Hmmm this is definitely a bug!

In [14]: dat.set_index('label', append=True).swaplevel(0, 1) Out[14]:                            value label datetime a     2011-07-19 07:00:00      0       2011-07-19 08:00:00      1       2011-07-19 09:00:00      2 b     2011-07-19 07:00:00      3       2011-07-19 08:00:00      4       2011-07-19 09:00:00      5 

A hacky workaround is to convert the (datetime) level directly (when it's already a MultiIndex):

In [15]: dat.index.levels[1] = dat.index.get_level_values(1).tz_localize('UTC').tz_convert('US/Pacific')  In [16]: dat1 Out[16]:                                  value label datetime a     2011-07-19 00:00:00-07:00      0       2011-07-19 01:00:00-07:00      1       2011-07-19 02:00:00-07:00      2 b     2011-07-19 00:00:00-07:00      3       2011-07-19 01:00:00-07:00      4       2011-07-19 02:00:00-07:00      5 
like image 189
Andy Hayden Avatar answered Oct 21 '22 07:10

Andy Hayden