Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas sample based on criteria

Tags:

python

pandas

I would like to use pandas sample function but with a criteria without grouping or filtering data.

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randint(low=0, high=5, size=(10000, 2)),columns=['a', 'b'])

print(df.sample(n=100))

This will sample 100 rows, but what if i want to sample 50 rows containing 0 to 50 rows containing 1 in df['a'].

like image 394
destinychoice Avatar asked Nov 19 '25 16:11

destinychoice


1 Answers

You can use the == operator to make a list* of boolean values. And when said list is put into the getter ([]) it will filter the values. If you want to, you can use n=50 to create a sample size of 50 rows.

New code

df[df['a']==1].sample(n=50)

Full code

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randint(low=0, high=5, size=(10000, 2)),columns=['a', 'b'])

print(df[df['a']==1].sample(n=50))

*List isn't literally a list in this context, but it is a great word for explaining how it works. It's a technically a DataFrame that maps rows to a true/false value.

More obscure DataFrame sampling

If you want to sample all 50 where a is 1 or 0:

print(df[(df['a']==1) | (df['a']==0)].sample(n=50))

And if you want to sample 50 of each:

df1 = df[df['a']==1].sample(n=50)
df0 = df[df['a']==0].sample(n=50)
print(pd.concat([df1,df0]))
like image 128
Neil Avatar answered Nov 21 '25 07:11

Neil