Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: Random selection per group

Say that I have a dataframe that looks like:

Name Group_Id AAA  1 ABC  1 CCC  2 XYZ  2 DEF  3  YYH  3 

How could I randomly select one (or more) row for each Group_Id? Say that I want one random draw per Group_Id, I would get:

Name Group_Id AAA  1 XYZ  2 DEF  3 
like image 325
Plug4 Avatar asked Mar 18 '14 06:03

Plug4


2 Answers

size = 2        # sample size replace = True  # with replacement fn = lambda obj: obj.loc[np.random.choice(obj.index, size, replace),:] df.groupby('Group_Id', as_index=False).apply(fn) 
like image 82
behzad.nouri Avatar answered Sep 28 '22 09:09

behzad.nouri


From 0.16.x onwards pd.DataFrame.sample provides a way to return a random sample of items from an axis of object.

In [664]: df.groupby('Group_Id').apply(lambda x: x.sample(1)).reset_index(drop=True) Out[664]:   Name  Group_Id 0  ABC         1 1  XYZ         2 2  DEF         3 
like image 44
Zero Avatar answered Sep 28 '22 08:09

Zero