One of my favorite aspects of using the ggplot2
library in R is the ability to easily specify aesthetics. I can quickly make a scatterplot and apply color associated with a specific column and I would love to be able to do this with python/pandas/matplotlib. I'm wondering if there are there any convenience functions that people use to map colors to values using pandas dataframes and Matplotlib?
##ggplot scatterplot example with R dataframe, `df`, colored by col3 ggplot(data = df, aes(x=col1, y=col2, color=col3)) + geom_point() ##ideal situation with pandas dataframe, 'df', where colors are chosen by col3 df.plot(x=col1,y=col2,color=col3)
EDIT: Thank you for your responses but I want to include a sample dataframe to clarify what I am asking. Two columns contain numerical data and the third is a categorical variable. The script I am thinking of will assign colors based on this value.
np.random.seed(250) df = pd.DataFrame({'Height': np.append(np.random.normal(6, 0.25, size=5), np.random.normal(5.4, 0.25, size=5)), 'Weight': np.append(np.random.normal(180, 20, size=5), np.random.normal(140, 20, size=5)), 'Gender': ["Male","Male","Male","Male","Male", "Female","Female","Female","Female","Female"]}) Height Weight Gender 0 5.824970 159.210508 Male 1 5.780403 180.294943 Male 2 6.318295 199.142201 Male 3 5.617211 157.813278 Male 4 6.340892 191.849944 Male 5 5.625131 139.588467 Female 6 4.950479 146.711220 Female 7 5.617245 121.571890 Female 8 5.556821 141.536028 Female 9 5.714171 134.396203 Female
hue : (optional) This parameter take column name for colour encoding. data : (optional) This parameter take DataFrame, array, or list of arrays, Dataset for plotting. If x and y are absent, this is interpreted as wide-form. Otherwise it is expected to be long-form.
import numpy import pandas import matplotlib.pyplot as plt import seaborn seaborn.set(style='ticks') numpy.random.seed(0) N = 37 _genders= ['Female', 'Male', 'Non-binary', 'No Response'] df = pandas.DataFrame({ 'Height (cm)': numpy.random.uniform(low=130, high=200, size=N), 'Weight (kg)': numpy.random.uniform(low=30, high=100, size=N), 'Gender': numpy.random.choice(_genders, size=N) })
seaborn 0.11.0
, it's recommended to use new figure level functions like seaborn.relplot
than to use FacetGrid
directly.seaborn.relplot(data=df, x='Weight (kg)', y='Height (cm)', hue='Gender', hue_order=_genders, aspect=1.61) plt.show()
Seaborn handles this use-case splendidly:
matplotlib.pyplot.scatter
onto a seaborn.FacetGrid
fg = seaborn.FacetGrid(data=df, hue='Gender', hue_order=_genders, aspect=1.61) fg.map(plt.scatter, 'Weight (kg)', 'Height (cm)').add_legend()
Which immediately outputs:
In this case, I would use matplotlib directly.
import numpy as np import matplotlib.pyplot as plt import pandas as pd def dfScatter(df, xcol='Height', ycol='Weight', catcol='Gender'): fig, ax = plt.subplots() categories = np.unique(df[catcol]) colors = np.linspace(0, 1, len(categories)) colordict = dict(zip(categories, colors)) df["Color"] = df[catcol].apply(lambda x: colordict[x]) ax.scatter(df[xcol], df[ycol], c=df.Color) return fig if 1: df = pd.DataFrame({'Height':np.random.normal(size=10), 'Weight':np.random.normal(size=10), 'Gender': ["Male","Male","Unknown","Male","Male", "Female","Did not respond","Unknown","Female","Female"]}) fig = dfScatter(df) fig.savefig('fig1.png')
And that gives me:
As far as I know, that color column can be any matplotlib compatible color (RBGA tuples, HTML names, hex values, etc).
I'm having trouble getting anything but numerical values to work with the colormaps.
Actually you could use ggplot for python:
from ggplot import * import numpy as np import pandas as pd df = pd.DataFrame({'Height':np.random.randn(10), 'Weight':np.random.randn(10), 'Gender': ["Male","Male","Male","Male","Male", "Female","Female","Female","Female","Female"]}) ggplot(aes(x='Height', y='Weight', color='Gender'), data=df) + geom_point()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With