Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to change the step size matplotlib uses when plotting timestamp objects?

I'm currently attempting to graph a fairly small dataset using the matplotlib and pandas libraries. The format of the dataset is a CSV file. Here is the dataset:

DATE,UNRATE
1948-01-01,3.4
1948-02-01,3.8
1948-03-01,4.0
1948-04-01,3.9
1948-05-01,3.5
1948-06-01,3.6
1948-07-01,3.6
1948-08-01,3.9
1948-09-01,3.8
1948-10-01,3.7
1948-11-01,3.8
1948-12-01,4.0

I loaded the dataset using pandas (as can be seen, the file that holds that dataset is named 'dataset.csv'):

import matplotlib.pyplot as plt
import pandas as pd

dataset = pd.read_csv('dataset.csv')
dataset['DATE'] = pd.to_datetime(dataset['DATE'])

I then attempted to plot the dataset loaded, using matplotlib:

plt.plot(dataset['DATE'], dataset['UNRATE'])
plt.show()

The code above mostly worked fine, and displayed the following graph:

enter image description here

The problem, however, is that the data I wanted displayed on the x axis, seems to have only been plotted in intervals of two:

enter image description here

I found the question, Changing the “tick frequency” on x or y axis in matplotlib?, which does correlate to my problem. But, from my testing, only seems to work with integral values.

I also found the question, controlling the number of x ticks in pyplot, which seemed to provide a solution to my problem. The method the answer said to use, to_pydatetime, was a method of DatetimeIndex. Since my understanding is that pandas.to_datetime would return a DatetimeIndex by default, I could use to_pydatetime on dataset['DATE']:

plt.xticks(dataset['DATE'].to_pydatetime())

However, I instead received the error:

AttributeError: 'Series' object has no attribute 'to_pydatetime'

Since this appears to just be default behavior, is there a way to force matplotlib to graph each point along the x axis, rather than simply graphing every other point?

like image 791
Christian Dean Avatar asked Jan 29 '23 10:01

Christian Dean


1 Answers

To get rid of the error you may convert the dates as follows and also set the labels accordingly:

plt.xticks(dataset['DATE'].tolist(),dataset['DATE'].tolist())

or as has been mentionned in the comments

plt.xticks(dataset['DATE'].dt.to_pydatetime(),dataset['DATE'].dt.to_pydatetime()) 

enter image description here

But let's look at some more useful options.

Plotting strings

First of all it is possible to plot the data as it is, i.e. as strings.

import matplotlib.pyplot as plt
import pandas as pd

dataset = pd.read_csv('dateunrate.txt')
plt.plot(dataset['DATE'], dataset['UNRATE'])

plt.setp(plt.gca().get_xticklabels(), rotation=45, ha="right")
plt.show()

enter image description here

This is just like plotting plt.plot(["apple", "banana", "cherry"], [1,2,3]). This means that the successive dates are just placed one-by-one on the axes, independent on whether they are a minute, a day or a year appart. E.g. if your dates were 2018-01-01, 2018-01-03, 2018-01-27 they would still appear equally spaced on the axes.

Plot dates with pandas (automatically)

Pandas can nicely plot dates out of the box if the dates are in the index of the dataframe. To this end you may read the dataframe in a way that the first csv column is parsed as the index.

import matplotlib.pyplot as plt
import pandas as pd

dataset = pd.read_csv('dateunrate.txt', parse_dates=[0], index_col=0)
dataset.plot()

plt.show() 

enter image description here

This is equivalent to

dataset = pd.read_csv('../dateunrate.txt', parse_dates=[0])
dataset = dataset.set_index("DATE")
dataset.plot()

or

dataset = pd.read_csv('../dateunrate.txt')
dataset["DATE"] = pd.to_datetime(dataset["DATE"])
dataset = dataset.set_index("DATE")
dataset.plot()

or even

dataset = pd.read_csv('../dateunrate.txt')
dataset["DATE"] = pd.to_datetime(dataset["DATE"])
dataset.plot(x="DATE",y="UNRATE")

This works nice in this case because you happen to have one date per month and pandas will decide to show all 12 months as ticklabels in this case.
For other cases this may result in different tick locations.

Plot dates with matplotlib or pandas (manually)

In the general case, you may use matplotlib.dates formatters and locators to tweak the tick(label)s in the way you want. Here, we might use a MonthLocator and set the ticklabel format to "%b %Y". This works well with matplotlib plot or pandas plot(x_compat=True).

import matplotlib.pyplot as plt
import pandas as pd
import matplotlib.dates as mdates

dataset = pd.read_csv('dateunrate.txt', parse_dates=[0], index_col=0)

plt.plot(dataset.index, dataset['UNRATE'])
## or use 
#dataset.plot(x_compat=True) #note the x_compat argument

plt.gca().xaxis.set_major_locator(mdates.MonthLocator())
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter("%b %Y"))

plt.setp(plt.gca().get_xticklabels(), rotation=45, ha="right")
plt.show()

enter image description here

like image 187
ImportanceOfBeingErnest Avatar answered Jan 31 '23 00:01

ImportanceOfBeingErnest