This question has been asked before, Multiple data in scatter matrix, but didn't receive an answer.
I'd like to make a scatter matrix, something like in the pandas docs, but with differently colored markers for different classes. For example, I'd like some points to appear in green and others in blue depending on the value of one of the columns (or a separate list).
Here's an example using the Iris dataset. The color of the points represents the species of Iris -- Setosa, Versicolor, or Virginica.
Does pandas (or matplotlib) have a way to make a chart like that?
A scatterplot matrix is a matrix associated to n numerical arrays (data variables), $X_1,X_2,…,X_n$ , of the same length. The cell (i,j) of such a matrix displays the scatter plot of the variable Xi versus Xj. Here we show the Plotly Express function px.
Double click on the scatter data to open the Plot Details dialog. To change the symbol color click on the Symbol Color drop-down menu and select a color from the Individual Color option.
A scatter plot matrix is a grid (or matrix) of scatter plots used to visualize bivariate relationships between combinations of variables. Each scatter plot in the matrix visualizes the relationship between a pair of variables, allowing many relationships to be explored in one chart.
The following was my stopgap measure:
def factor_scatter_matrix(df, factor, palette=None):
'''Create a scatter matrix of the variables in df, with differently colored
points depending on the value of df[factor].
inputs:
df: pandas.DataFrame containing the columns to be plotted, as well
as factor.
factor: string or pandas.Series. The column indicating which group
each row belongs to.
palette: A list of hex codes, at least as long as the number of groups.
If omitted, a predefined palette will be used, but it only includes
9 groups.
'''
import matplotlib.colors
import numpy as np
from pandas.tools.plotting import scatter_matrix
from scipy.stats import gaussian_kde
if isinstance(factor, basestring):
factor_name = factor #save off the name
factor = df[factor] #extract column
df = df.drop(factor_name,axis=1) # remove from df, so it
# doesn't get a row and col in the plot.
classes = list(set(factor))
if palette is None:
palette = ['#e41a1c', '#377eb8', '#4eae4b',
'#994fa1', '#ff8101', '#fdfc33',
'#a8572c', '#f482be', '#999999']
color_map = dict(zip(classes,palette))
if len(classes) > len(palette):
raise ValueError('''Too many groups for the number of colors provided.
We only have {} colors in the palette, but you have {}
groups.'''.format(len(palette), len(classes)))
colors = factor.apply(lambda group: color_map[group])
axarr = scatter_matrix(df,figsize=(10,10),marker='o',c=colors,diagonal=None)
for rc in xrange(len(df.columns)):
for group in classes:
y = df[factor == group].icol(rc).values
gkde = gaussian_kde(y)
ind = np.linspace(y.min(), y.max(), 1000)
axarr[rc][rc].plot(ind, gkde.evaluate(ind),c=color_map[group])
return axarr, color_map
As an example, we'll use the same dataset as in the question, available here
>>> import pandas as pd
>>> iris = pd.read_csv('iris.csv')
>>> axarr, color_map = factor_scatter_matrix(iris,'Name')
>>> color_map
{'Iris-setosa': '#377eb8',
'Iris-versicolor': '#4eae4b',
'Iris-virginica': '#e41a1c'}
Hope this is helpful!
You can also call the scattermatrix from pandas as follow :
pd.scatter_matrix(df,color=colors)
with colors
being an list of size len(df)
containing colors
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With