Logo Questions Linux Laravel Mysql Ubuntu Git Menu

Understanding the diagonal in Pandas' scatter matrix plot


I'm plotting a scatter plot with Pandas. I can understand the plot, except the curves in diagonal plots. Can someone explain to me what they mean?


enter image description here


import pylab import numpy as np from pandas.tools.plotting import scatter_matrix import pandas as pd  def make_scatter_plot(X, name):         """     Make scatterplot.      Parameters:     -----------     X:a design matrix where each column is a feature and each row is an observation.     name: the name of the plot.     """     pylab.clf()     df = pd.DataFrame(X)     axs = scatter_matrix(df, alpha=0.2, diagonal='kde')      for ax in axs[:,0]: # the left boundary         ax.grid('off', axis='both')         ax.set_yticks([0, .5])      for ax in axs[-1,:]: # the lower boundary         ax.grid('off', axis='both')         ax.set_xticks([0, .5])      pylab.savefig(name + ".png") 
like image 585
Jack Twain Avatar asked Oct 14 '14 12:10

Jack Twain

People also ask

Which method in pandas tools plotting is used to create scatter plot matrix?

1 Answer. The scatter plot matrix can be created by using DataFrame. plot. scatter() method.

What is the name of pandas library tools used to create a scatter plot matrix?

Pandas uses matplotlib to display scatter matrices.

1 Answers

As you can tell, the scatter matrix is plotting each of the columns specified against each other column.

However, in this format, when you got to a diagonal, you would see a plot of a column against itself. Since this would always be a straight line, Pandas decides it can give you more useful information, and plots the density plot of just that column of data.

See http://pandas.pydata.org/pandas-docs/stable/visualization.html#density-plot.

If you would rather have a histogram, you could change your plotting code to:

axs = scatter_matrix(df, alpha=0.2, diagonal='hist') 
like image 120
Wilduck Avatar answered Sep 18 '22 16:09
