I would like to use pandas sample function but with a criteria without grouping or filtering data.
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(low=0, high=5, size=(10000, 2)),columns=['a', 'b'])
print(df.sample(n=100))
This will sample 100 rows, but what if i want to sample 50 rows containing 0 to 50 rows containing 1 in df['a'].
You can use the == operator to make a list* of boolean values. And when said list is put into the getter ([]) it will filter the values. If you want to, you can use n=50 to create a sample size of 50 rows.
df[df['a']==1].sample(n=50)
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(low=0, high=5, size=(10000, 2)),columns=['a', 'b'])
print(df[df['a']==1].sample(n=50))
*List isn't literally a list in this context, but it is a great word for explaining how it works. It's a technically a DataFrame that maps rows to a true/false value.
If you want to sample all 50 where a is 1 or 0:
print(df[(df['a']==1) | (df['a']==0)].sample(n=50))
And if you want to sample 50 of each:
df1 = df[df['a']==1].sample(n=50)
df0 = df[df['a']==0].sample(n=50)
print(pd.concat([df1,df0]))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With