Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Displaying pair plot in Pandas data frame

I am trying to display a pair plot by creating from scatter_matrix in pandas dataframe. This is how the pair plot is created:

# Create dataframe from data in X_train
# Label the columns using the strings in iris_dataset.feature_names
iris_dataframe = pd.DataFrame(X_train, columns=iris_dataset.feature_names)
# Create a scatter matrix from the dataframe, color by y_train
grr = pd.scatter_matrix(iris_dataframe, c=y_train, figsize=(15, 15), marker='o',
hist_kwds={'bins': 20}, s=60, alpha=.8, cmap=mglearn.cm3)

I want to display the pair plot to look something like this;

Enter image description here

I am using Python v3.6 and PyCharm and am not using Jupyter Notebook.

like image 722
user3848207 Avatar asked Mar 04 '17 05:03

user3848207


People also ask

How do you plot a Pairplot in pandas?

To plot multiple pairwise bivariate distributions in a dataset, you can use the pairplot() function. This shows the relationship for (n, 2) combination of variable in a DataFrame as a matrix of plots and the diagonal plots are the univariate plots.

What does the Pairplot show?

Pair plot is used to understand the best set of features to explain a relationship between two variables or to form the most separated clusters. It also helps to form some simple classification models by drawing some simple lines or make linear separation in our data-set.

How do you read a pair plot?

The way to interpret this matrix is as follows: The variable names are displayed on the outer edges of the matrix. The boxes along the diagonals display the density plot for each variable. The boxes in the lower left corner display the scatterplot between each variable.

How do you make a Pairplot?

A pairs plot is a matrix of scatterplots that lets you understand the pairwise relationship between different variables in a dataset. The easiest way to create a pairs plot in Python is to use the seaborn. pairplot(df) function.


2 Answers

This code worked for me using Python 3.5.2:

import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

from sklearn import datasets

iris_dataset = datasets.load_iris()
X = iris_dataset.data
Y = iris_dataset.target

iris_dataframe = pd.DataFrame(X, columns=iris_dataset.feature_names)

# Create a scatter matrix from the dataframe, color by y_train
grr = pd.plotting.scatter_matrix(iris_dataframe, c=Y, figsize=(15, 15), marker='o',
                                 hist_kwds={'bins': 20}, s=60, alpha=.8)

For pandas version < v0.20.0.

Thanks to michael-szczepaniak for pointing out that this API had been deprecated.

grr = pd.scatter_matrix(iris_dataframe, c=Y, figsize=(15, 15), marker='o',
                        hist_kwds={'bins': 20}, s=60, alpha=.8)

I just had to remove the cmap=mglearn.cm3 piece, because I was not able to make mglearn work. There is a version mismatch issue with sklearn.

To not display the image and save it directly to file you can use this method:

plt.savefig('foo.png')

Also remove

# %matplotlib inline

Enter image description here

like image 157
Vikash Singh Avatar answered Sep 25 '22 08:09

Vikash Singh


Just an update to Vikash's excellent answer. The last two lines should now be:

grr = pd.plotting.scatter_matrix(iris_dataframe, c=Y, figsize=(15, 15), marker='o',
                                 hist_kwds={'bins': 20}, s=60, alpha=.8)

The scatter_matrix function has been moved to the plotting package, so the original answer, while correct is now deprecated.

So the complete code would now be:

import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

from sklearn import datasets

iris_dataset = datasets.load_iris()
X = iris_dataset.data
Y = iris_dataset.target

iris_dataframe = pd.DataFrame(X, columns=iris_dataset.feature_names)
# create a scatter matrix from the dataframe, color by y_train
grr = pd.plotting.scatter_matrix(iris_dataframe, c=Y, figsize=(15, 15), marker='o',
                                 hist_kwds={'bins': 20}, s=60, alpha=.8)
like image 28
Michael Szczepaniak Avatar answered Sep 22 '22 08:09

Michael Szczepaniak