Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Plotting datetimeindex on x-axis with matplotlib creates wrong ticks in pandas 0.15 in contrast to 0.14

I create a simple pandas dataframe with some random values and a DatetimeIndex like so:

import pandas as pd
from numpy.random import randint
import datetime as dt
import matplotlib.pyplot as plt

# create a random dataframe with datetimeindex
dateRange = pd.date_range('1/1/2011', '3/30/2011', freq='D')
randomInts = randint(1, 50, len(dateRange))
df = pd.DataFrame({'RandomValues' : randomInts}, index=dateRange)

Then I plot it in two different ways:

# plot with pandas own matplotlib wrapper
df.plot()

# plot directly with matplotlib pyplot
plt.plot(df.index, df.RandomValues)

plt.show()

(Do not use both statements at the same time as they plot on the same figure.)

I use Python 3.4 64bit and matplotlib 1.4. With pandas 0.14, both statements give me the expected plot (they use slightly different formatting of the x-axis which is okay; note that data is random so the plots do not look the same): pandas 0.14: pandas plot

pandas 0.14: matplotlib plot

However, when using pandas 0.15, the pandas plot looks alright but the matplotlib plot has some strange tick format on the x-axis:

pandas 0.15: pandas plot

pandas 0.15: matplotlib plot

Is there any good reason for this behaviour and why it has changed from pandas 0.14 to 0.15?

like image 229
Dirk Avatar asked Oct 23 '14 10:10

Dirk


People also ask

What is DatetimeIndex pandas?

DatetimeIndex [source] Immutable ndarray of datetime64 data, represented internally as int64, and which can be boxed to Timestamp objects that are subclasses of datetime and carry metadata such as frequency information.

Is PLT show () blocking?

show() and plt. draw() are unnecessary and / or blocking in one way or the other.

How do I show all ticks in Matplotlib?

Use xticks() method to show all the X-coordinates in the plot. Use yticks() method to show all the Y-coordinates in the plot. To display the figure, use show() method.


2 Answers

Note that this bug was fixed in pandas 0.15.1 (https://github.com/pandas-dev/pandas/pull/8693), and plt.plot(df.index, df.RandomValues) now just works again.


The reason for this change in behaviour is that starting from 0.15, the pandas Index object is no longer a numpy ndarray subclass. But the real reason is that matplotlib does not support the datetime64 dtype.

As a workaround, in the case you want to use the matplotlib plot function, you can convert the index to python datetime's using to_pydatetime:

plt.plot(df.index.to_pydatetime(), df.RandomValues)

More in detail explanation:

Because Index is no longer a ndarray subclass, matplotlib will convert the index to a numpy array with datetime64 dtype (while before, it retained the Index object, of which scalars are returned as Timestamp values, a subclass of datetime.datetime, which matplotlib can handle). In the plot function, it calls np.atleast_1d() on the input which now returns a datetime64 array, which matplotlib handles as integers.

I opened an issue about this (as this gets possibly a lot of use): https://github.com/pydata/pandas/issues/8614

like image 87
joris Avatar answered Nov 06 '22 02:11

joris


With matplotlib 1.5.0 this 'just works':

import pandas as pd
from numpy.random import randint
import datetime as dt
import matplotlib.pyplot as plt

# create a random dataframe with datetimeindex
dateRange = pd.date_range('1/1/2011', '3/30/2011', freq='D')
randomInts = randint(1, 50, len(dateRange))
df = pd.DataFrame({'RandomValues' : randomInts}, index=dateRange)

fig, ax = plt.subplots()
ax.plot('RandomValues', data=df)

demo image

like image 35
tacaswell Avatar answered Nov 06 '22 02:11

tacaswell