Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting a random sample in Python dataframe by category

I have a sample list like this:

Category| Item
--------|-------
Animal  | Fish
Animal  | Cat
...     |
Food    | Fish
Food    | Cake
...     |
etc...

I want to take a random sample of 10 items out of each category, so that the remaining dataframe just has those records.

I've tried df.sample() but it just gives me samples across the board.

I can do this this through df.iterrows() but I am hoping there is a more simple solution.

like image 499
Simon O'Doherty Avatar asked Dec 27 '16 12:12

Simon O'Doherty


1 Answers

You have to tell pandas you want to group by category with the groupby method.

df.groupby('category')['item'].apply(lambda s: s.sample(10))

If you have less than ten items in a sample but don't want to sample with replacement you can do this.

df.groupby('category')['item'].apply(lambda s: s.sample(min(len(s), 10)))
like image 57
Ted Petrou Avatar answered Oct 12 '22 11:10

Ted Petrou