Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Plotting time series using Seaborn FacetGrid

I have a DataFrame (data) with a simple integer index and 5 columns. The columns are Date, Country, AgeGroup, Gender, Stat. (Names changed to protect the innocent.) I would like to produce a FacetGrid where the Country defines the row, AgeGroup defines the column, and Gender defines the hue. For each of those particulars, I would like to produce a time series graph. I.e. I should get an array of graphs each of which has 2 time series on it (1 male, 1 female). I can get very close with:

g = sns.FacetGrid(data, row='Country', col='AgeGroup', hue='Gender')
g.map(plt.plot, 'Stat')

However this just gives me the sample number on the x-axis rather than the dates. Is there a quick fix in this context.

More generally, I understand that the approach with FacetGrid is to make the grid and then map a plotting function to it. If I wanted to roll my own plotting function, what are the conventions it needs to follow? In particular, how can I write my own plotting function (to pass to map for FacetGrid) that accepts multiple columns worth of data from my dataset?

like image 221
8one6 Avatar asked Sep 06 '14 15:09

8one6


1 Answers

I'll answer your more general question first. The rules for functions that you can pass to FacetGrid.map are:

  • They must take array-like inputs as positional arguments, with the first argument corresponding to the x axis and the second argument corresponding to the y axis (though, more on the second condition shortly
  • They must also accept two keyword arguments: color, and label. If you want to use a hue variable than these should get passed to the underlying plotting function, though you can just catch **kwargs and not do anything with them if it's not relevant to the specific plot you're making.
  • When called, they must draw a plot on the "currently active" matplotlib Axes.

There may be cases where your function draws a plot that looks correct without taking x, y, positional inputs. I think that's basically what's going on here with the way you're using plt.plot. It can be easier then to just call, e.g., g.set_axis_labels("Date", "Stat") after you use map, which will rename your axes properly. You may also want to do g.set(xticklabels=dates) to get more meaningful ticks.

There is also a more general function, FacetGrid.map_dataframe. The rules here are similar, but the function you pass must accept a dataframe input in a parameter called data, and instead of taking array-like positional inputs it takes strings that correspond to variables in that dataframe. On each iteration through the facets, the function will be called with the input dataframe masked to just the values for that combination of row, col, and hue levels.

So in your specific case, you'll need to write a function that we can call plot_by_date that should look something like this:

def plot_by_date(x, y, color=None, label=None):

    ...

(I'd be more helpful on the body, but I don't actually know how to do much with dates and matplotlib). The end result is that when you call this function it should plot on the currently-active Axes. Then do

g.map(plot_by_date, "Date", "Stat")

And it should work, I think.

like image 180
mwaskom Avatar answered Oct 21 '22 07:10

mwaskom