Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas plot function ignores timezone of timeseries

Tags:

python

pandas

When plotting a timeseries with the built-in plot function of pandas, it seems to ignore the timezone of my index: it always uses the UTC time for the x-axis. An example:

import numpy as np
import matplotlib.pyplot as plt
from pandas import rolling_mean, DataFrame, date_range

rng = date_range('1/1/2011', periods=200, freq='S', tz="UTC")
data = DataFrame(np.random.randn(len(rng), 3), index=rng, columns=['A', 'B', 'C'])
data_cet = data.tz_convert("CET")

# plot with data in UTC timezone
fig, ax = plt.subplots()
data[["A", "B"]].plot(ax=ax, grid=True)
plt.show()

# plot with data in CET timezone, but the x-axis remains the same as above
fig, ax = plt.subplots()
data_cet[["A", "B"]].plot(ax=ax, grid=True)
plt.show()

The plot does not change, although the index has:

In [11]: data.index[0]
Out[11]: <Timestamp: 2011-01-01 00:00:00+0000 UTC, tz=UTC>
In [12]: data_cet.index[0]
Out[12]: <Timestamp: 2011-01-01 01:00:00+0100 CET, tz=CET>

Should I file a bug, or do I miss something?

like image 581
joris Avatar asked Oct 23 '12 12:10

joris


2 Answers

This is definitely a bug. I've created a report on github. The reason is because internally, pandas converts a regular frequency DatetimeIndex to PeriodIndex to hook into formatters/locators in pandas, and currently PeriodIndex does NOT retain timezone information. Please stay tuned for a fix.

like image 81
Chang She Avatar answered Oct 05 '22 02:10

Chang She


from pytz import timezone as ptz
import matplotlib as mpl
...
data.index = pd.to_datetime(data.index, utc=True).tz_localize(tz=ptz('<your timezone>'))
...
mpl.rcParams['timezone'] = data.index.tz.zone

... after which matplotlib prints as that zone rather than UTC.

However! Note if you need to annotate, the x locations of the annotations will still need to be in UTC, even whilst strings passed to data.loc[] or data.at[] will be assumed to be in the set timezone!

For instance I needed to show a series of vertical lines labelled with timestamps on them: (this is after most of the plot calls, and note the timestamp strings in sels were UTC)

sels = ['2019-03-21 3:56:28',
         '2019-03-21 4:00:30',
         '2019-03-21 4:05:55',
         '2019-03-21 4:13:40']
ax.vlines(sels,125,145,lw=1,color='grey') # 125 was bottom, 145 was top in data units
for s in sels:
    tstr = pd.to_datetime(s, utc=True)\
    .astimezone(tz=ptz(data.index.tz.zone))\
    .isoformat().split('T')[1].split('+')[0]
    ax.annotate(tstr,xy=(s,125),xycoords='data',
              xytext=(0,5), textcoords='offset points', rotation=90,
              horizontalalignment='right', verticalalignment='bottom')

This puts grey vertical lines at the times chosen manually in sels, and labels them in local timezone hours, minutes and seconds. (the .split()[] business discards the date and timezone info from the .isoformat() string).

But when I need to actually get corresponding values from data using the same s in sels, I then have to use the somewhat awkward:

data.tz_convert('UTC').at[s]

Whereas just

data.at[s]

Fails with a KeyError because pandas interprets s is in the data.index.tz timezone, and so interpreted, the timestamps fall outside of range of the contents of data

like image 41
RGD2 Avatar answered Oct 05 '22 02:10

RGD2