Say that I have a dataframe that looks like:
Name Group_Id AAA 1 ABC 1 CCC 2 XYZ 2 DEF 3 YYH 3
How could I randomly select one (or more) row for each Group_Id
? Say that I want one random draw per Group_Id
, I would get:
Name Group_Id AAA 1 XYZ 2 DEF 3
size = 2 # sample size replace = True # with replacement fn = lambda obj: obj.loc[np.random.choice(obj.index, size, replace),:] df.groupby('Group_Id', as_index=False).apply(fn)
From 0.16.x
onwards pd.DataFrame.sample
provides a way to return a random sample of items from an axis of object.
In [664]: df.groupby('Group_Id').apply(lambda x: x.sample(1)).reset_index(drop=True) Out[664]: Name Group_Id 0 ABC 1 1 XYZ 2 2 DEF 3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With