Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Select sample random groups after groupby in pandas?

Tags:

python

pandas

I have a very large DataFrame that looks like this example df:

df = 

col1    col2     col3 
apple   red      2.99 
apple   red      2.99 
apple   red      1.99 
apple   pink     1.99 
apple   pink     1.99 
apple   pink     2.99 
...     ....      ...
pear    green     .99 
pear    green     .99 
pear    green    1.29

I am grouping by 2 columns like this:

g = df.groupby(['col1', 'col2'])

Now I want to select say 3 random groups. So my expected output is this:

col1    col2     col3 
apple   red      2.99 
apple   red      2.99 
apple   red      1.99 
pear    green     .99 
pear    green     .99 
pear    green    1.29
lemon   yellow    .99 
lemon   yellow    .99 
lemon   yellow   1.99 

(Let's pretend those above three groups are random groups from df). How can I achieve this? I have using this. But this did not help me in my case.

like image 443
Hana Avatar asked Dec 05 '22 12:12

Hana


1 Answers

You can do with shuffle and ngroup

g = df.groupby(['col1', 'col2'])

a=np.arange(g.ngroups)
np.random.shuffle(a)

df[g.ngroup().isin(a[:2])]# change 2 to what you need :-) 
like image 180
BENY Avatar answered Feb 06 '23 09:02

BENY