I have a very large DataFrame that looks like this example df:
df = 
col1    col2     col3 
apple   red      2.99 
apple   red      2.99 
apple   red      1.99 
apple   pink     1.99 
apple   pink     1.99 
apple   pink     2.99 
...     ....      ...
pear    green     .99 
pear    green     .99 
pear    green    1.29
I am grouping by 2 columns like this:
g = df.groupby(['col1', 'col2'])
Now I want to select say 3 random groups. So my expected output is this:
col1    col2     col3 
apple   red      2.99 
apple   red      2.99 
apple   red      1.99 
pear    green     .99 
pear    green     .99 
pear    green    1.29
lemon   yellow    .99 
lemon   yellow    .99 
lemon   yellow   1.99 
(Let's pretend those above three groups are random groups from df). How can I achieve this? I have using this. But this did not help me in my case.
You can do with shuffle and ngroup
g = df.groupby(['col1', 'col2'])
a=np.arange(g.ngroups)
np.random.shuffle(a)
df[g.ngroup().isin(a[:2])]# change 2 to what you need :-) 
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With