Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas/matplotlib plot with date-axis shows correct day/month but wrong weekday/year

I'm loading CSV data using pandas, where one of the columns takes the form of a date in the format '%a %d.%m.%Y' (e.g. 'Mon 06.02.2017'), and then trying to make some plots where the x-axis is labeled according to the date.

Something goes wrong during the plotting, because the date labels are wrong; e.g. what was 'Mon 06.02.2017' in the CSV/DataFrame is shown as 'Thu 06.02.0048' on the plot axis.

Here is a MWE. This is file 'data.csv':

Mon 06.02.2017  ;  1  ;  2  ;  3
Tue 07.02.2017  ;  4  ;  5  ;  6
Wed 08.02.2017  ;  7  ;  8  ;  9
Thu 09.02.2017  ; 10  ; 11  ; 12
Fri 10.02.2017  ; 13  ; 14  ; 15
Sat 11.02.2017  ; 16  ; 17  ; 18
Sun 12.02.2017  ; 19  ; 20  ; 21
Mon 13.02.2017  ; 22  ; 23  ; 24
Tue 14.02.2017  ; 25  ; 26  ; 27
Wed 15.02.2017  ; 28  ; 29  ; 30
Thu 16.02.2017  ; 31  ; 32  ; 33

And this is the parsing/plotting code 'plot.py':

import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates


df = pd.read_csv(
        'data.csv',
        sep='\s*;\s*',
        header=None,
        names=['date', 'x', 'y', 'z'],
        parse_dates=['date'],
        date_parser=lambda x: pd.datetime.strptime(x, '%a %d.%m.%Y'),
        # infer_datetime_format=True,
        # dayfirst=True,
        engine='python',
)

# DataFrame 'date' Series looks fine
print df.date

ax1 = df.plot(x='date', y='x', legend=True)
ax2 = df.plot(x='date', y='y', ax=ax1, legend=True)
ax3 = df.plot(x='date', y='z', ax=ax1, legend=True)

ax1.xaxis.set_minor_locator(mdates.DayLocator(interval=1))
ax1.xaxis.set_minor_formatter(mdates.DateFormatter('%a %d.%m.%Y'))
ax1.xaxis.grid(True, which='minor')

plt.setp(ax1.xaxis.get_minorticklabels(), rotation=45)
plt.setp(ax1.xaxis.get_majorticklabels(), visible=False)
plt.tight_layout()

plt.show()

Notice that the DataFrame.date Series seems to contain the correct dates, so it's likely a matplotlib issue rather than a pandas/parsing error.

In case it might matter (although I doubt), my locale is LC_TIME = en_US.UTF-8.

Also, according to https://www.timeanddate.com/date/weekday.html, the day 06.02.0048 was actually a Tuesday, so somehow the plotted year isn't even really year 0048.

I'm really at a loss, thanks to anyone who is willing to check this out.

like image 394
user2152106 Avatar asked Mar 29 '17 09:03

user2152106


2 Answers

Although I couldn't really figure out why it's not working, it seems it has something to do with plotting with pandas vs. solely with matplotlib and maybe the mdates.DateFormatter...

When I comment out the formatting lines, it seems to start working:

# ax1.xaxis.set_minor_locator(mdates.DayLocator(interval=1))
# ax1.xaxis.set_minor_formatter(mdates.DateFormatter('%a %d.%m.%Y'))
# ax1.xaxis.grid(True, which='minor')
# 
# plt.setp(ax1.xaxis.get_minorticklabels(), rotation=45)
# plt.setp(ax1.xaxis.get_majorticklabels(), visible=False)

enter image description here

Pandas plotting the dates automatically works fine, but calling any matplotlib functions breaks the dates. Only commenting out #plt.setp(ax1.xaxis.get_majorticklabels(), visible=False), will plot both the Pandas and Matplotlib xaxis, with the odd 0048 showing up again: enter image description here

So the issue remains.

However, you can circumvent this by replacing parse_dates=['date'] with index_col=0, creating a matplotlib figure explicitly, and changing mdates.DateFormatter with ticker.FixedFormatter:

import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import matplotlib.ticker as ticker

df = pd.read_csv(
    'data.csv',
    sep='\s*;\s*',
    header=None,
    names=['date', 'x', 'y', 'z'],
    index_col=0,
    date_parser=lambda x: pd.to_datetime(x, format='%a %d.%m.%Y'),
    engine='python'
)

ax = plt.figure().add_subplot(111)
ax.plot(df)

ticklabels = [item.strftime('%d-%m-%y') for item in df.index]
ax.xaxis.set_major_locator(mdates.DayLocator(interval=1))
ax.xaxis.set_major_formatter(ticker.FixedFormatter(ticklabels))

plt.xticks(rotation='90')
ax.xaxis.grid(True, which='major')

plt.tight_layout()

plt.show()

enter image description here

like image 200
Chris Avatar answered Oct 24 '22 01:10

Chris


I ran into this problem as well, but the root cause was different.

I put some debugging in the matplotlib DateFormatter class to figure out what data it was actually operating on. As it turned out, the pandas query that was running against postgres was producing date objects instead of timestamp objects. This was causing the dates to get mis-parsed such that the year was incorrect (parsed as year 0046 instead of 2018).

The solution was to update the query to cast the time column as a timestamp, and then everything worked out correctly.

SELECT start_time::timestamp at time zone '{{timezone}}' as "Start Time" ...

That said, I'm a bit shocked that the related libraries are not robust enough to handle the different kinds of date representations that postgres can produce.

like image 34
Brian Fouts Avatar answered Oct 24 '22 01:10

Brian Fouts