Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to disable date interpolation in matplotlib?

Despite trying some solutions available on SO and at Matplotlib's documentation, I'm still unable to disable Matplotlib's creation of weekend dates on the x-axis.

As you can see see below, it adds dates to the x-axis that are not in the original Pandas column.

enter image description here

I'm plotting my data using (commented lines are unsuccessful in achieving my goal):

fig, ax1 = plt.subplots()

x_axis = df.index.values
ax1.plot(x_axis, df['MP'], color='k')
ax2 = ax1.twinx()
ax2.plot(x_axis, df['R'], color='r')

# plt.xticks(np.arange(len(x_axis)), x_axis)
# fig.autofmt_xdate()
# ax1.fmt_xdata = mdates.DateFormatter('%Y-%m-%d')

fig.tight_layout()
plt.show()

An example of my Pandas dataframe is below, with dates as index:

2019-01-09  1.007042  2585.898714  4.052480e+09  19.980000  12.07     1
2019-01-10  1.007465  2581.828491  3.704500e+09  19.500000  19.74     1
2019-01-11  1.007154  2588.605258  3.434490e+09  18.190001  18.68     1
2019-01-14  1.008560  2582.151225  3.664450e+09  19.070000  14.27     1

Some suggestions I've found include a custom ticker here and here however although I don't get errors the plot is missing my second series.

Any suggestions on how to disable date interpolation in matplotlib?

like image 528
pepe Avatar asked Jan 20 '19 15:01

pepe


3 Answers

The matplotlib site recommends creating a custom formatter class. This class will contain logic that tells the axis label not to display anything if the date is a weekend. Here's an example using a dataframe I constructed from the 2018 data that was in the image you'd attached:

df = pd.DataFrame(
data = {
 "Col 1" : [1.000325, 1.000807, 1.001207, 1.000355, 1.001512, 1.003237, 1.000979],
 "MP": [2743.002071, 2754.011543, 2746.121450, 2760.169848, 2780.756857, 2793.953050, 2792.675162],
 "Col 3": [3.242650e+09, 3.453480e+09, 3.576350e+09, 3.641320e+09, 3.573970e+09, 3.573970e+09, 4.325970e+09], 
 "Col 4": [9.520000, 10.080000, 9.820000, 9.880000, 10.160000, 10.160000, 11.660000],
 "Col 5": [5.04, 5.62, 5.29, 6.58, 8.32, 9.57, 9.53],
 "R": [0,0,0,0,0,1,1]
}, 
index=['2018-01-08', '2018-01-09', '2018-01-10', '2018-01-11',
       '2018-01-12', '2018-01-15', '2018-01-16'])
  1. Move the dates from the index to their own column:
df = df.reset_index().rename({'index': 'Date'}, axis=1, copy=False)
df['Date'] = pd.to_datetime(df['Date'])
  1. Create the custom formatter class:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import Formatter
%config InlineBackend.figure_format = 'retina' # Get nicer looking graphs for retina displays

class CustomFormatter(Formatter):
    def __init__(self, dates, fmt='%Y-%m-%d'):
        self.dates = dates
        self.fmt = fmt

    def __call__(self, x, pos=0):
        'Return the label for time x at position pos'
        ind = int(np.round(x))
        if ind >= len(self.dates) or ind < 0:
            return ''

        return self.dates[ind].strftime(self.fmt)
  1. Now let's plot the MP and R series. Pay attention to the line where we call the custom formatter:
formatter = CustomFormatter(df['Date'])

fig, ax1 = plt.subplots()
ax1.xaxis.set_major_formatter(formatter)
ax1.plot(np.arange(len(df)), df['MP'], color='k')
ax2 = ax1.twinx()
ax2.plot(np.arange(len(df)), df['R'], color='r')
fig.autofmt_xdate()
fig.tight_layout()
plt.show()

The above code outputs this graph: Output graph

Now, no weekend dates, such as 2018-01-13, are displayed on the x-axis.

like image 124
James Dellinger Avatar answered Nov 06 '22 10:11

James Dellinger


If you would like to simply not show the weekends, but for the graph to still scale correctly matplotlib has a built-in functionality for this in matplotlib.mdates. Specifically, the WeekdayLocator pretty much solves this problem singlehandedly. It's a one line solution (the rest just fabricates data for testing). Note that this works whether or not the data includes weekends:

import matplotlib.pyplot as plt
import datetime
import numpy as np
import matplotlib.dates as mdates
from matplotlib.dates import MO, TU, WE, TH, FR, SA, SU

DT_FORMAT="%Y-%m-%d"

if __name__ == "__main__":
    N = 14
    #Fake data
    x =  list(zip([2018]*N, [5]*N, list(range(1,N+1))))
    x = [datetime.datetime(*y) for y in x]
    x = [y for y in x if y.weekday() < 5]
    random_walk_steps = 2 * np.random.randint(0, 6, len(x)) - 3
    random_walk = np.cumsum(random_walk_steps)
    y = np.arange(len(x)) + random_walk

    # Make a figure and plot everything
    fig, ax = plt.subplots()
    ax.plot(x, y)

    ### HERE IS THE BIT THAT ANSWERS THE QUESTION
    ax.xaxis.set_major_locator(mdates.WeekdayLocator(byweekday=(MO, TU, WE, TH, FR)))
    ax.xaxis.set_major_formatter(mdates.DateFormatter(DT_FORMAT))

    # plot stuff
    fig.autofmt_xdate()
    plt.tight_layout()
    plt.show()

enter image description here

like image 42
Him Avatar answered Nov 06 '22 10:11

Him


If you are trying to avoid the fact that matplotlib is interpolating between each point of your dataset, you can exploit the fact that matplotlib will plot a new line segment each time a np.NaN is encountered. Pandas makes it easy to insert np.NaN for the days that are not in your dataset with pd.Dataframe.asfreq():

df = pd.DataFrame(data = {
    "Col 1" : [1.000325, 1.000807, 1.001207, 1.000355, 1.001512, 1.003237, 1.000979],
    "MP": [2743.002071, 2754.011543, 2746.121450, 2760.169848, 2780.756857, 2793.953050, 2792.675162],
    "Col 3": [3.242650e+09, 3.453480e+09, 3.576350e+09, 3.641320e+09, 3.573970e+09, 3.573970e+09, 4.325970e+09],
    "Col 4": [9.520000, 10.080000, 9.820000, 9.880000, 10.160000, 10.160000, 11.660000],
    "Col 5": [5.04, 5.62, 5.29, 6.58, 8.32, 9.57, 9.53],
    "R": [0,0,0,0,0,1,1]
    },
    index=['2018-01-08', '2018-01-09', '2018-01-10', '2018-01-11', '2018-01-12', '2018-01-15', '2018-01-16'])

df.index = pd.to_datetime(df.index)

#rescale R so I don't need to worry about twinax
df.loc[df.R==0, 'R'] = df.loc[df.R==0, 'R'] + df.MP.min()
df.loc[df.R==1, 'R'] = df.loc[df.R==1, 'R'] * df.MP.max()

df = df.asfreq('D')

df
               Col 1           MP         Col 3  Col 4  Col 5            R
2018-01-08  1.000325  2743.002071  3.242650e+09   9.52   5.04  2743.002071
2018-01-09  1.000807  2754.011543  3.453480e+09  10.08   5.62  2743.002071
2018-01-10  1.001207  2746.121450  3.576350e+09   9.82   5.29  2743.002071
2018-01-11  1.000355  2760.169848  3.641320e+09   9.88   6.58  2743.002071
2018-01-12  1.001512  2780.756857  3.573970e+09  10.16   8.32  2743.002071
2018-01-13       NaN          NaN           NaN    NaN    NaN          NaN
2018-01-14       NaN          NaN           NaN    NaN    NaN          NaN
2018-01-15  1.003237  2793.953050  3.573970e+09  10.16   9.57  2793.953050
2018-01-16  1.000979  2792.675162  4.325970e+09  11.66   9.53  2793.953050

df[['MP', 'R']].plot(); plt.show()

enter image description here

like image 1
Kyle Avatar answered Nov 06 '22 12:11

Kyle