Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to randomly select some pandas dataframe rows?

I have a pandas dataframe df which contains a column amount. For many rows, the amount is zero. I want to randomly remove 50% of the rows where the amount is zero, keeping all rows where amount is nonzero. How can I do this?

like image 912
royco Avatar asked Feb 16 '26 16:02

royco


1 Answers

pandas

Using query + sample

df.drop(df.query('amount == 0').sample(frac=.5).index)

Consider the dataframe df

df = pd.DataFrame(dict(amount=[0, 1] * 10))

df.drop(df.query('amount == 0').sample(frac=.5).index)

numpy

iszero = df.amount.values == 0
count_zeros = iszero.sum()
idx = np.arange(iszero.shape[0])
keep_these = np.random.choice(idx[iszero], int(iszero.sum() * .5), replace=False)

df.iloc[np.sort(np.concatenate([idx[~iszero], keep_these]))]

    amount
1        1
2        0
3        1
5        1
6        0
7        1
8        0
9        1
10       0
11       1
12       0
13       1
15       1
17       1
19       1

time test

enter image description here

Per the comment from @tomcy, you can use the parameter inplace=True to remove the rows from df without having to reassign df

df.drop(df.query('amount == 0').sample(frac=.5).index, inplace=True)
df

    amount
1        1
2        0
3        1
5        1
6        0
7        1
8        0
9        1
10       0
11       1
12       0
13       1
15       1
17       1
19       1

like image 117
piRSquared Avatar answered Feb 19 '26 05:02

piRSquared



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!