I'm plotting a scatter plot with Pandas
. I can understand the plot, except the curves in diagonal plots. Can someone explain to me what they mean?
Image:
Code:
import pylab import numpy as np from pandas.tools.plotting import scatter_matrix import pandas as pd def make_scatter_plot(X, name): """ Make scatterplot. Parameters: ----------- X:a design matrix where each column is a feature and each row is an observation. name: the name of the plot. """ pylab.clf() df = pd.DataFrame(X) axs = scatter_matrix(df, alpha=0.2, diagonal='kde') for ax in axs[:,0]: # the left boundary ax.grid('off', axis='both') ax.set_yticks([0, .5]) for ax in axs[-1,:]: # the lower boundary ax.grid('off', axis='both') ax.set_xticks([0, .5]) pylab.savefig(name + ".png")
1 Answer. The scatter plot matrix can be created by using DataFrame. plot. scatter() method.
Pandas uses matplotlib to display scatter matrices.
As you can tell, the scatter matrix is plotting each of the columns specified against each other column.
However, in this format, when you got to a diagonal, you would see a plot of a column against itself. Since this would always be a straight line, Pandas decides it can give you more useful information, and plots the density plot of just that column of data.
See http://pandas.pydata.org/pandas-docs/stable/visualization.html#density-plot.
If you would rather have a histogram, you could change your plotting code to:
axs = scatter_matrix(df, alpha=0.2, diagonal='hist')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With