Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Matplotlib.pyplot.hist() very slow

I'm plotting about 10,000 items in an array. They are of around 1,000 unique values.

The plotting has been running half an hour now. I made sure rest of the code works.

Is it that slow? This is my first time plotting histograms with pyplot.

like image 575
Fenwick Avatar asked Mar 02 '16 03:03

Fenwick


6 Answers

To plot histograms using matplotlib quickly you need to pass the histtype='step' argument to pyplot.hist. For example:

plt.hist(np.random.exponential(size=1000000,bins=10000))
plt.show()

takes ~15 seconds to draw and roughly 5-10 seconds to update when you pan or zoom.

In contrast, plotting with histtype='step':

plt.hist(np.random.exponential(size=1000000),bins=10000,histtype='step')
plt.show()

plots almost immediately and can be panned and zoomed with no delay.

like image 65
user545424 Avatar answered Oct 20 '22 06:10

user545424


It will be instant to plot the histogram after flattening the numpy array. Try the below demo code:

import numpy as np

array2d = np.random.random_sample((512,512))*100
plt.hist(array2d.flatten())
plt.hist(array2d.flatten(), bins=1000)
like image 22
CcMango Avatar answered Oct 20 '22 08:10

CcMango


Importing seaborn somewhere in the code may cause pyplot.hist to take a really long time.

If the problem is seaborn, it can be solved by resetting the matplotlib settings:

import seaborn as sns
sns.reset_orig()
like image 6
np8 Avatar answered Oct 20 '22 06:10

np8


For me, the problem is that the data type of pd.Series, say S, is 'object' rather than 'float64'. After I use S = np.float64(S), then plt.hist(S) is very quick.

like image 3
Napoléon Avatar answered Oct 20 '22 07:10

Napoléon


I was facing the same problem using Pandas .hist() method. For me the solution was:

pd.to_numeric(df['your_data']).hist()

Which worked instantly.

like image 1
Nic Scozzaro Avatar answered Oct 20 '22 06:10

Nic Scozzaro


Since several answers already mention the issue of slowness with pandas.hist(), note that it may be due to dealing with non-numerical data. An issue easily solved by using value_counts() :

df['colour'].value_counts().plot(kind='bar')

credits

like image 1
Skippy le Grand Gourou Avatar answered Oct 20 '22 06:10

Skippy le Grand Gourou