class labels in Pandas scattermatrix

Tags:

This question has been asked before, Multiple data in scatter matrix, but didn't receive an answer.

I'd like to make a scatter matrix, something like in the pandas docs, but with differently colored markers for different classes. For example, I'd like some points to appear in green and others in blue depending on the value of one of the columns (or a separate list).

Here's an example using the Iris dataset. The color of the points represents the species of Iris -- Setosa, Versicolor, or Virginica.

iris scattermatrix with class labels

Does pandas (or matplotlib) have a way to make a chart like that?

484

asked Apr 08 '14 17:04

bgschiller

2 Answers

Update: This functionality is now in the latest version of Seaborn. Here's an example.

The following was my stopgap measure:

def factor_scatter_matrix(df, factor, palette=None):
    '''Create a scatter matrix of the variables in df, with differently colored
    points depending on the value of df[factor].
    inputs:
        df: pandas.DataFrame containing the columns to be plotted, as well 
            as factor.
        factor: string or pandas.Series. The column indicating which group 
            each row belongs to.
        palette: A list of hex codes, at least as long as the number of groups.
            If omitted, a predefined palette will be used, but it only includes
            9 groups.
    '''
    import matplotlib.colors
    import numpy as np
    from pandas.tools.plotting import scatter_matrix
    from scipy.stats import gaussian_kde

    if isinstance(factor, basestring):
        factor_name = factor #save off the name
        factor = df[factor] #extract column
        df = df.drop(factor_name,axis=1) # remove from df, so it 
        # doesn't get a row and col in the plot.

    classes = list(set(factor))

    if palette is None:
        palette = ['#e41a1c', '#377eb8', '#4eae4b', 
                   '#994fa1', '#ff8101', '#fdfc33', 
                   '#a8572c', '#f482be', '#999999']

    color_map = dict(zip(classes,palette))

    if len(classes) > len(palette):
        raise ValueError('''Too many groups for the number of colors provided.
We only have {} colors in the palette, but you have {}
groups.'''.format(len(palette), len(classes)))

    colors = factor.apply(lambda group: color_map[group])
    axarr = scatter_matrix(df,figsize=(10,10),marker='o',c=colors,diagonal=None)


    for rc in xrange(len(df.columns)):
        for group in classes:
            y = df[factor == group].icol(rc).values
            gkde = gaussian_kde(y)
            ind = np.linspace(y.min(), y.max(), 1000)
            axarr[rc][rc].plot(ind, gkde.evaluate(ind),c=color_map[group])

    return axarr, color_map

As an example, we'll use the same dataset as in the question, available here

>>> import pandas as pd
>>> iris = pd.read_csv('iris.csv')
>>> axarr, color_map = factor_scatter_matrix(iris,'Name')
>>> color_map
{'Iris-setosa': '#377eb8',
 'Iris-versicolor': '#4eae4b',
 'Iris-virginica': '#e41a1c'}

iris_scatter_matrix

Hope this is helpful!

answered Oct 15 '22 14:10

bgschiller

You can also call the scattermatrix from pandas as follow :

pd.scatter_matrix(df,color=colors)

with colors being an list of size len(df)containing colors

answered Oct 15 '22 14:10

jrjc

Related questions
                            
                                TypeError: <lambda>() takes no arguments (1 given)
                            
                                How to specify column names while reading an Excel file using Pandas?
                            
                                Django, division between two annotate result won't calculate correctly
                            
                                MongoDB Print Pretty with PyMongo [duplicate]
                            
                                RuntimeError: 'list' must be None or a list, not <class 'str'> while trying to start celery worker
                            
                                Python client error 'Connection reset by peer'
                            
                                Converting a list into comma-separated string with "and" before the last item - Python 2.7
                            
                                Sunrise and Sunset time in Python
                            
                                dropping empty columns in pandas 0.23+ [duplicate]
                            
                                Anaconda Error - module 'brotli' has no attribute 'error'
                            
                                3d game with Python, starting from nothing [closed]
                            
                                No hosts found: Fabric
                            
                                opencv python osx
                            
                                Ruby’s “method_missing” in Python [duplicate]
                            
                                What's the most efficient way to convert a MySQL result set to a NumPy array?
                            
                                Remove all inline styles using BeautifulSoup
                            
                                Calling AppleScript from Python without using osascript or appscript?
                            
                                Truth value of a string in python
                            
                                Python Django custom template tags register.assignment_tag not working
                            
                                convert rgba color codes 255,255,255,255 to kivy color codes in 1,1,1,1

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

class labels in Pandas scattermatrix

Tags:

python

pandas

matplotlib

scatter-plot

bgschiller

People also ask

2 Answers

Update: This functionality is now in the latest version of Seaborn. Here's an example.

bgschiller

jrjc

Recent Activity

Donate For Us