I have a very large DataFrame that looks like this example df:
df =
col1 col2 col3
apple red 2.99
apple red 2.99
apple red 1.99
apple pink 1.99
apple pink 1.99
apple pink 2.99
... .... ...
pear green .99
pear green .99
pear green 1.29
I am grouping by 2 columns like this:
g = df.groupby(['col1', 'col2'])
Now I want to select say 3 random groups. So my expected output is this:
col1 col2 col3
apple red 2.99
apple red 2.99
apple red 1.99
pear green .99
pear green .99
pear green 1.29
lemon yellow .99
lemon yellow .99
lemon yellow 1.99
(Let's pretend those above three groups are random groups from df). How can I achieve this? I have using this. But this did not help me in my case.
You can do with shuffle
and ngroup
g = df.groupby(['col1', 'col2'])
a=np.arange(g.ngroups)
np.random.shuffle(a)
df[g.ngroup().isin(a[:2])]# change 2 to what you need :-)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With