When plotting a timeseries with the built-in plot function of pandas, it seems to ignore the timezone of my index: it always uses the UTC time for the x-axis. An example:
import numpy as np
import matplotlib.pyplot as plt
from pandas import rolling_mean, DataFrame, date_range
rng = date_range('1/1/2011', periods=200, freq='S', tz="UTC")
data = DataFrame(np.random.randn(len(rng), 3), index=rng, columns=['A', 'B', 'C'])
data_cet = data.tz_convert("CET")
# plot with data in UTC timezone
fig, ax = plt.subplots()
data[["A", "B"]].plot(ax=ax, grid=True)
plt.show()
# plot with data in CET timezone, but the x-axis remains the same as above
fig, ax = plt.subplots()
data_cet[["A", "B"]].plot(ax=ax, grid=True)
plt.show()
The plot does not change, although the index has:
In [11]: data.index[0]
Out[11]: <Timestamp: 2011-01-01 00:00:00+0000 UTC, tz=UTC>
In [12]: data_cet.index[0]
Out[12]: <Timestamp: 2011-01-01 01:00:00+0100 CET, tz=CET>
Should I file a bug, or do I miss something?
This is definitely a bug. I've created a report on github. The reason is because internally, pandas converts a regular frequency DatetimeIndex to PeriodIndex to hook into formatters/locators in pandas, and currently PeriodIndex does NOT retain timezone information. Please stay tuned for a fix.
from pytz import timezone as ptz
import matplotlib as mpl
...
data.index = pd.to_datetime(data.index, utc=True).tz_localize(tz=ptz('<your timezone>'))
...
mpl.rcParams['timezone'] = data.index.tz.zone
... after which matplotlib prints as that zone rather than UTC.
However! Note if you need to annotate, the x locations of the annotations will still need to be in UTC, even whilst strings passed to data.loc[] or data.at[] will be assumed to be in the set timezone!
For instance I needed to show a series of vertical lines labelled with timestamps on them: (this is after most of the plot calls, and note the timestamp strings in sels were UTC)
sels = ['2019-03-21 3:56:28',
'2019-03-21 4:00:30',
'2019-03-21 4:05:55',
'2019-03-21 4:13:40']
ax.vlines(sels,125,145,lw=1,color='grey') # 125 was bottom, 145 was top in data units
for s in sels:
tstr = pd.to_datetime(s, utc=True)\
.astimezone(tz=ptz(data.index.tz.zone))\
.isoformat().split('T')[1].split('+')[0]
ax.annotate(tstr,xy=(s,125),xycoords='data',
xytext=(0,5), textcoords='offset points', rotation=90,
horizontalalignment='right', verticalalignment='bottom')
This puts grey vertical lines at the times chosen manually in sels
, and labels them in local timezone hours, minutes and seconds. (the .split()[]
business discards the date and timezone info from the .isoformat()
string).
But when I need to actually get corresponding values from data using the same s
in sels
, I then have to use the somewhat awkward:
data.tz_convert('UTC').at[s]
Whereas just
data.at[s]
Fails with a KeyError
because pandas interprets s
is in the data.index.tz
timezone, and so interpreted, the timestamps fall outside of range of the contents of data
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With