Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What are ways to speed up seaborns pairplot

I have a dataframe with 250.000 rows but 140 columns and I'm trying to construct a pair plot. of the variables. I know the number of subplots is huge, as well as the time it takes to do the plots. (I'm waiting for more than an hour on an i5 with 3,4 GHZ and 32 GB RAM).

Remebering that scikit learn allows to construct random forests in parallel, I was checking if this was possible also with seaborn. However, I didn't find anything. The source code seems to call the matplotlib plot function for every single image.

Couldn't this be parallelised? If yes, what is a good way to start from here?

like image 629
Quickbeam2k1 Avatar asked Jun 03 '16 10:06

Quickbeam2k1


People also ask

What does hue do in Pairplot?

In seaborn, the hue parameter determines which column in the data frame should be used for colour encoding. Using the official document for lmplot provided an example for this. Adding `hue="smoker" tells seaborn you want to colour the data points for smoker and non-smoker differently.

What does SNS Pairplot do?

The pairplot function creates a grid of Axes such that each variable in data will by shared in the y-axis across a single row and in the x-axis across a single column.

Why is Pairplot used?

Pair plot is used to understand the best set of features to explain a relationship between two variables or to form the most separated clusters. It also helps to form some simple classification models by drawing some simple lines or make linear separation in our data-set.


2 Answers

Rather than parallelizing, you could downsample your DataFrame to say, 1000 rows to get a quick peek, if the speed bottleneck is indeed occurring there. 1000 points is enough to get a general idea of what's going on, usually.

i.e. sns.pairplot(df.sample(1000)).

like image 171
ijoseph Avatar answered Oct 11 '22 13:10

ijoseph


Save your pairplot to image and then show this image instead of rendering it all in your browser.

from IPython.display import Image
import seaborn as sns
import matplotlib.pyplot as plt 

sns_plot = sns.pairplot(df, size=2.0)
sns_plot.savefig("pairplot.png")

plt.clf() # Clean parirplot figure from sns 
Image(filename='pairplot.png') # Show pairplot as image
like image 4
Alex Kosh Avatar answered Oct 11 '22 14:10

Alex Kosh