Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Inconsistency when setting figure size using pandas plot method

Tags:

I'm trying to use the convenience of the plot method of a pandas dataframe while adjusting the size of the figure produced. (I'm saving the figures to file as well as displaying them inline in a Jupyter notebook). I found the method below successful most of the time, except when I plot two lines on the same chart - then the figure goes back to the default size.

I suspect this might be due to the differences between plot on a series and plot on a dataframe.

Setup example code:

data = {
    'A': 90 + np.random.randn(366),
    'B': 85 + np.random.randn(366)
}

date_range = pd.date_range('2016-01-01', '2016-12-31')

index = pd.Index(date_range, name='Date')

df = pd.DataFrame(data=data, index=index)

Control - this code produces the expected result (a wide plot):

fig = plt.figure(figsize=(10,4))

df['A'].plot()
plt.savefig("plot1.png")
plt.show()

Result:

plot1.png

Plotting two lines - figure size is not (10,4)

fig = plt.figure(figsize=(10,4))

df[['A', 'B']].plot()
plt.savefig("plot2.png")
plt.show()

Result:

plot2.png

What's the right way to do this so that the figure size is consistency set regardless of number of series selected?

like image 238
Bill Avatar asked Feb 13 '17 23:02

Bill


People also ask

How do you change the size of a Pandas plot?

The size of a plot can be modified by passing required dimensions as a tuple to the figsize parameter of the plot() method. it is used to determine the size of a figure object.

Is plotting possible in pandas?

Pandas uses the plot() method to create diagrams. We can use Pyplot, a submodule of the Matplotlib library to visualize the diagram on the screen. Read more about Matplotlib in our Matplotlib Tutorial.


2 Answers

The reason for the difference between the two cases is a bit hidden inside the logic of pandas.DataFrame.plot(). As one can see in the documentation this method allows a lot of arguments to be passed such that it will handle all kinds of different cases.

Here in the first case, you create a matplotlib figure via fig = plt.figure(figsize=(10,4)) and then plot a single column DataFrame. Now the internal logic of pandas plot function is to check if there is already a figure present in the matplotlib state machine, and if so, use it's current axes to plot the columns values to it. This works as expected.

However in the second case, the data consists of two columns. There are several options how to handle such a plot, including using different subplots with shared or non-shared axes etc. In order for pandas to be able to apply any of those possible requirements, it will by default create a new figure to which it can add the axes to plot to. The new figure will not know about the already existing figure and its size, but rather have the default size, unless you specify the figsize argument.

In the comments, you say that a possible solution is to use df[['A', 'B']].plot(figsize=(10,4)). This is correct, but you then need to omit the creation of your initial figure. Otherwise it will produce 2 figures, which is probably undesired. In a notebook this will not be visible, but if you run this as a usual python script with plt.show() at the end, there will be two figure windows opening.

So the solution which lets pandas take care of figure creation is

import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame({"A":[2,3,1], "B":[1,2,2]})
df[['A', 'B']].plot(figsize=(10,4))

plt.show()

A way to circumvent the creation of a new figure is to supply the ax argument to the pandas.DataFrame.plot(ax=ax) function, where ax is an externally created axes. This axes can be the standard axes you obtain via plt.gca().

import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame({"A":[2,3,1], "B":[1,2,2]})
plt.figure(figsize=(10,4))
df[['A', 'B']].plot(ax = plt.gca())

plt.show()

Alternatively use the more object oriented way seen in the answer from PaulH.

like image 91
ImportanceOfBeingErnest Avatar answered Sep 18 '22 15:09

ImportanceOfBeingErnest


Always operate explicitly and directly on your Figure and Axes objects. Don't rely on the pyplot state machine. In your case that means:

fig1, ax1 = plt.subplots(figsize=(10,4))
df['A'].plot(ax=ax1)
fig1.savefig("plot1.png")


fig2, ax2 = plt.figure(figsize=(10,4)) 
df[['A', 'B']].plot(ax=ax2)
fig2.savefig("plot2.png")

plt.show()
like image 42
Paul H Avatar answered Sep 20 '22 15:09

Paul H