I have a DataFrame (data
) with a simple integer index and 5 columns. The columns are Date
, Country
, AgeGroup
, Gender
, Stat
. (Names changed to protect the innocent.) I would like to produce a FacetGrid
where the Country
defines the row, AgeGroup
defines the column, and Gender
defines the hue. For each of those particulars, I would like to produce a time series graph. I.e. I should get an array of graphs each of which has 2 time series on it (1 male, 1 female). I can get very close with:
g = sns.FacetGrid(data, row='Country', col='AgeGroup', hue='Gender')
g.map(plt.plot, 'Stat')
However this just gives me the sample number on the x-axis rather than the dates. Is there a quick fix in this context.
More generally, I understand that the approach with FacetGrid
is to make the grid and then map
a plotting function to it. If I wanted to roll my own plotting function, what are the conventions it needs to follow? In particular, how can I write my own plotting function (to pass to map
for FacetGrid
) that accepts multiple columns worth of data from my dataset?
I'll answer your more general question first. The rules for functions that you can pass to FacetGrid.map
are:
color
, and label
. If you want to use a hue
variable than these should get passed to the underlying plotting function, though you can just catch **kwargs
and not do anything with them if it's not relevant to the specific plot you're making.There may be cases where your function draws a plot that looks correct without taking x
, y
, positional inputs. I think that's basically what's going on here with the way you're using plt.plot
. It can be easier then to just call, e.g., g.set_axis_labels("Date", "Stat")
after you use map
, which will rename your axes properly. You may also want to do g.set(xticklabels=dates)
to get more meaningful ticks.
There is also a more general function, FacetGrid.map_dataframe
. The rules here are similar, but the function you pass must accept a dataframe input in a parameter called data
, and instead of taking array-like positional inputs it takes strings that correspond to variables in that dataframe. On each iteration through the facets, the function will be called with the input dataframe masked to just the values for that combination of row
, col
, and hue
levels.
So in your specific case, you'll need to write a function that we can call plot_by_date
that should look something like this:
def plot_by_date(x, y, color=None, label=None):
...
(I'd be more helpful on the body, but I don't actually know how to do much with dates and matplotlib). The end result is that when you call this function it should plot on the currently-active Axes. Then do
g.map(plot_by_date, "Date", "Stat")
And it should work, I think.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With