Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Color by Column Values in Matplotlib

One of my favorite aspects of using the ggplot2 library in R is the ability to easily specify aesthetics. I can quickly make a scatterplot and apply color associated with a specific column and I would love to be able to do this with python/pandas/matplotlib. I'm wondering if there are there any convenience functions that people use to map colors to values using pandas dataframes and Matplotlib?

##ggplot scatterplot example with R dataframe, `df`, colored by col3 ggplot(data = df, aes(x=col1, y=col2, color=col3)) + geom_point()  ##ideal situation with pandas dataframe, 'df', where colors are chosen by col3 df.plot(x=col1,y=col2,color=col3) 

EDIT: Thank you for your responses but I want to include a sample dataframe to clarify what I am asking. Two columns contain numerical data and the third is a categorical variable. The script I am thinking of will assign colors based on this value.

np.random.seed(250) df = pd.DataFrame({'Height': np.append(np.random.normal(6, 0.25, size=5), np.random.normal(5.4, 0.25, size=5)),                    'Weight': np.append(np.random.normal(180, 20, size=5), np.random.normal(140, 20, size=5)),                    'Gender': ["Male","Male","Male","Male","Male",                               "Female","Female","Female","Female","Female"]})       Height      Weight  Gender 0  5.824970  159.210508    Male 1  5.780403  180.294943    Male 2  6.318295  199.142201    Male 3  5.617211  157.813278    Male 4  6.340892  191.849944    Male 5  5.625131  139.588467  Female 6  4.950479  146.711220  Female 7  5.617245  121.571890  Female 8  5.556821  141.536028  Female 9  5.714171  134.396203  Female 
like image 504
zach Avatar asked Feb 14 '13 23:02

zach


People also ask

What is Matplotlib hue?

hue : (optional) This parameter take column name for colour encoding. data : (optional) This parameter take DataFrame, array, or list of arrays, Dataset for plotting. If x and y are absent, this is interpreted as wide-form. Otherwise it is expected to be long-form.


2 Answers

Imports and Data

import numpy  import pandas import matplotlib.pyplot as plt import seaborn seaborn.set(style='ticks')  numpy.random.seed(0) N = 37 _genders= ['Female', 'Male', 'Non-binary', 'No Response'] df = pandas.DataFrame({     'Height (cm)': numpy.random.uniform(low=130, high=200, size=N),     'Weight (kg)': numpy.random.uniform(low=30, high=100, size=N),     'Gender': numpy.random.choice(_genders, size=N) }) 

Update August 2021

  • With seaborn 0.11.0, it's recommended to use new figure level functions like seaborn.relplot than to use FacetGrid directly.
seaborn.relplot(data=df, x='Weight (kg)', y='Height (cm)', hue='Gender', hue_order=_genders, aspect=1.61) plt.show() 

Update October 2015

Seaborn handles this use-case splendidly:

  • Map matplotlib.pyplot.scatter onto a seaborn.FacetGrid
fg = seaborn.FacetGrid(data=df, hue='Gender', hue_order=_genders, aspect=1.61) fg.map(plt.scatter, 'Weight (kg)', 'Height (cm)').add_legend() 

Which immediately outputs:

enter image description here

Old Answer

In this case, I would use matplotlib directly.

import numpy as np import matplotlib.pyplot as plt import pandas as pd  def dfScatter(df, xcol='Height', ycol='Weight', catcol='Gender'):     fig, ax = plt.subplots()     categories = np.unique(df[catcol])     colors = np.linspace(0, 1, len(categories))     colordict = dict(zip(categories, colors))        df["Color"] = df[catcol].apply(lambda x: colordict[x])     ax.scatter(df[xcol], df[ycol], c=df.Color)     return fig  if 1:     df = pd.DataFrame({'Height':np.random.normal(size=10),                        'Weight':np.random.normal(size=10),                        'Gender': ["Male","Male","Unknown","Male","Male",                                   "Female","Did not respond","Unknown","Female","Female"]})         fig = dfScatter(df)     fig.savefig('fig1.png') 

And that gives me:

scale plot with categorized colors

As far as I know, that color column can be any matplotlib compatible color (RBGA tuples, HTML names, hex values, etc).

I'm having trouble getting anything but numerical values to work with the colormaps.

like image 180
Paul H Avatar answered Oct 06 '22 02:10

Paul H


Actually you could use ggplot for python:

from ggplot import * import numpy as np import pandas as pd  df = pd.DataFrame({'Height':np.random.randn(10),                    'Weight':np.random.randn(10),                    'Gender': ["Male","Male","Male","Male","Male",                               "Female","Female","Female","Female","Female"]})   ggplot(aes(x='Height', y='Weight', color='Gender'), data=df)  + geom_point() 

ggplot in python

like image 39
Anton Protopopov Avatar answered Oct 06 '22 02:10

Anton Protopopov