I'm loading CSV data using pandas, where one of the columns takes the form of a date in the format '%a %d.%m.%Y' (e.g. 'Mon 06.02.2017'), and then trying to make some plots where the x-axis is labeled according to the date.
Something goes wrong during the plotting, because the date labels are wrong; e.g. what was 'Mon 06.02.2017' in the CSV/DataFrame is shown as 'Thu 06.02.0048' on the plot axis.
Here is a MWE. This is file 'data.csv':
Mon 06.02.2017 ; 1 ; 2 ; 3
Tue 07.02.2017 ; 4 ; 5 ; 6
Wed 08.02.2017 ; 7 ; 8 ; 9
Thu 09.02.2017 ; 10 ; 11 ; 12
Fri 10.02.2017 ; 13 ; 14 ; 15
Sat 11.02.2017 ; 16 ; 17 ; 18
Sun 12.02.2017 ; 19 ; 20 ; 21
Mon 13.02.2017 ; 22 ; 23 ; 24
Tue 14.02.2017 ; 25 ; 26 ; 27
Wed 15.02.2017 ; 28 ; 29 ; 30
Thu 16.02.2017 ; 31 ; 32 ; 33
And this is the parsing/plotting code 'plot.py':
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
df = pd.read_csv(
'data.csv',
sep='\s*;\s*',
header=None,
names=['date', 'x', 'y', 'z'],
parse_dates=['date'],
date_parser=lambda x: pd.datetime.strptime(x, '%a %d.%m.%Y'),
# infer_datetime_format=True,
# dayfirst=True,
engine='python',
)
# DataFrame 'date' Series looks fine
print df.date
ax1 = df.plot(x='date', y='x', legend=True)
ax2 = df.plot(x='date', y='y', ax=ax1, legend=True)
ax3 = df.plot(x='date', y='z', ax=ax1, legend=True)
ax1.xaxis.set_minor_locator(mdates.DayLocator(interval=1))
ax1.xaxis.set_minor_formatter(mdates.DateFormatter('%a %d.%m.%Y'))
ax1.xaxis.grid(True, which='minor')
plt.setp(ax1.xaxis.get_minorticklabels(), rotation=45)
plt.setp(ax1.xaxis.get_majorticklabels(), visible=False)
plt.tight_layout()
plt.show()
Notice that the DataFrame.date Series seems to contain the correct dates, so it's likely a matplotlib issue rather than a pandas/parsing error.
In case it might matter (although I doubt), my locale is LC_TIME = en_US.UTF-8.
Also, according to https://www.timeanddate.com/date/weekday.html, the day 06.02.0048 was actually a Tuesday, so somehow the plotted year isn't even really year 0048.
I'm really at a loss, thanks to anyone who is willing to check this out.
Although I couldn't really figure out why it's not working, it seems it has something to do with plotting with pandas vs. solely with matplotlib and maybe the mdates.DateFormatter
...
When I comment out the formatting lines, it seems to start working:
# ax1.xaxis.set_minor_locator(mdates.DayLocator(interval=1))
# ax1.xaxis.set_minor_formatter(mdates.DateFormatter('%a %d.%m.%Y'))
# ax1.xaxis.grid(True, which='minor')
#
# plt.setp(ax1.xaxis.get_minorticklabels(), rotation=45)
# plt.setp(ax1.xaxis.get_majorticklabels(), visible=False)
Pandas plotting the dates automatically works fine, but calling any matplotlib functions breaks the dates. Only commenting out #plt.setp(ax1.xaxis.get_majorticklabels(), visible=False)
, will plot both the Pandas and Matplotlib xaxis, with the odd 0048 showing up again:
So the issue remains.
However, you can circumvent this by replacing parse_dates=['date']
with index_col=0
, creating a matplotlib figure explicitly, and changing mdates.DateFormatter
with ticker.FixedFormatter
:
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import matplotlib.ticker as ticker
df = pd.read_csv(
'data.csv',
sep='\s*;\s*',
header=None,
names=['date', 'x', 'y', 'z'],
index_col=0,
date_parser=lambda x: pd.to_datetime(x, format='%a %d.%m.%Y'),
engine='python'
)
ax = plt.figure().add_subplot(111)
ax.plot(df)
ticklabels = [item.strftime('%d-%m-%y') for item in df.index]
ax.xaxis.set_major_locator(mdates.DayLocator(interval=1))
ax.xaxis.set_major_formatter(ticker.FixedFormatter(ticklabels))
plt.xticks(rotation='90')
ax.xaxis.grid(True, which='major')
plt.tight_layout()
plt.show()
I ran into this problem as well, but the root cause was different.
I put some debugging in the matplotlib DateFormatter
class to figure out what data it was actually operating on. As it turned out, the pandas query that was running against postgres was producing date objects instead of timestamp objects. This was causing the dates to get mis-parsed such that the year was incorrect (parsed as year 0046 instead of 2018).
The solution was to update the query to cast the time column as a timestamp, and then everything worked out correctly.
SELECT start_time::timestamp at time zone '{{timezone}}' as "Start Time" ...
That said, I'm a bit shocked that the related libraries are not robust enough to handle the different kinds of date representations that postgres can produce.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With