Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R Lattice like plots with Python, Pandas and Matplotlib

I have a pandas dataframe of "factors", floats and integers. I would like to make "R Lattice" like plots on it using conditioning and grouping on the categorical variables. I've used R extensively and wrote custom panel functions to get the plots formatted exactly how I wanted them, but I'm struggling with matplotlib to do the same types of plots succinctly. I am playing around with layouts and subplot2grid, but just cant seem to get it right.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

nRows = 500
df = pd.DataFrame({'c1' : np.random.choice(['A','B','C','D'], size=nRows),
               'c2' : np.random.choice(['P','Q','R'], size=nRows),
               'i1' : np.random.randint(20,50, nRows),
               'i2' : np.random.randint(0,10, nRows),
               'x1' : 3 * np.random.randn(nRows) + 90,
               'x2' : 2 * np.random.randn(nRows) + 89})

I would like to plot things such as the following (R lattice code examples)

x1 vs. x2 for each level of c1 (lattice code)

xyplot(x1 ~ x2 | c1, data = df)

x1 vs. x2 for each level of c1 with "global" legend c2 (symbols or colors)

xyplot(x1 ~ x2 | c1, groups = c2, data = df)

histograms of x1 for each c2

hist (~x1 | c1, data = df)

I am also trying to make "conditioned" contour plots such as those produced here (1.4.4.4)

https://scipy-lectures.github.io/intro/matplotlib/matplotlib.html

I have read through these examples: http://nbviewer.ipython.org/github/fonnesbeck/Bios366/blob/master/notebooks/Section2_4-Matplotlib.ipynb

However, I would like the layout to be generated from the number of levels in the categorical conditioning (or "by") variable(s). i.e. specify a number of columns, and the rows would be computed based on the number levels.

Appreciate any good advice or steps in the right direction. I'd prefer not use rpy2 or python ggplot (I messed around with them - found them to be frustrating and limiting too).

Thanks! Randall

like image 855
Randall Goodwin Avatar asked Sep 14 '14 06:09

Randall Goodwin


1 Answers

Seaborn is the most effective library I have found for doing faceted plots in python. Its a pandas aware wrapper around matplotlib which takes care of all the subplotting for you and updates the matplotlib styling to look more modern. It produces some really lovely output.

The faceting is done using the grid part of the library.

It works a little diffently from R in that you create the grid first and pass the data into it, along with the facets you want, row, columns, colours, etc. You then map plotting functions onto that grid, passing any required arguments to the mapped plotting functions.

#scatter plot one factor
import seaborn as sns
grid1 = sns.FacetGrid(df, col='c1')
grid1.map(plt.scatter, 'x1', 'x2')


#scatter plot with column and hue factor
grid2 = sns.FacetGrid(df, col='c1', hue='c2')
grid2.map(plt.scatter, 'x1', 'x2')


#histogram with one factor
grid3 = sns.FacetGrid(df, col='c1')
grid3.map(plt.hist, 'x1', alpha=.7)
like image 160
b10n Avatar answered Sep 18 '22 17:09

b10n