Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understanding the diagonal in Pandas' scatter matrix plot

Tags:

I'm plotting a scatter plot with Pandas. I can understand the plot, except the curves in diagonal plots. Can someone explain to me what they mean?

Image:

enter image description here

Code:

import pylab import numpy as np from pandas.tools.plotting import scatter_matrix import pandas as pd  def make_scatter_plot(X, name):         """     Make scatterplot.      Parameters:     -----------     X:a design matrix where each column is a feature and each row is an observation.     name: the name of the plot.     """     pylab.clf()     df = pd.DataFrame(X)     axs = scatter_matrix(df, alpha=0.2, diagonal='kde')      for ax in axs[:,0]: # the left boundary         ax.grid('off', axis='both')         ax.set_yticks([0, .5])      for ax in axs[-1,:]: # the lower boundary         ax.grid('off', axis='both')         ax.set_xticks([0, .5])      pylab.savefig(name + ".png") 
like image 585
Jack Twain Avatar asked Oct 14 '14 12:10

Jack Twain


People also ask

Which method in pandas tools plotting is used to create scatter plot matrix?

1 Answer. The scatter plot matrix can be created by using DataFrame. plot. scatter() method.

What is the name of pandas library tools used to create a scatter plot matrix?

Pandas uses matplotlib to display scatter matrices.


1 Answers

As you can tell, the scatter matrix is plotting each of the columns specified against each other column.

However, in this format, when you got to a diagonal, you would see a plot of a column against itself. Since this would always be a straight line, Pandas decides it can give you more useful information, and plots the density plot of just that column of data.

See http://pandas.pydata.org/pandas-docs/stable/visualization.html#density-plot.

If you would rather have a histogram, you could change your plotting code to:

axs = scatter_matrix(df, alpha=0.2, diagonal='hist') 
like image 120
Wilduck Avatar answered Sep 18 '22 16:09

Wilduck