randomly remove rows from dataframe based on condition

Question

given a dataframe with numerical values in a specific column, I want to randomly remove a certain percentage of the rows for which the value in that specific column lies within a certain range.

For example given the following dataframe:

df = pd.DataFrame({'col1': [1,2,3,4,5,6,7,8,9,10]})
df
   col1
0     1
1     2
2     3
3     4
4     5
5     6
6     7
7     8
8     9
9    10

2/5 of the rows where col1 is below 6 should be removed randomly.

Whats the most concise way to do that?

piRSquared · Accepted Answer

use sample + drop

df.drop(df.query('col1 < 6').sample(frac=.4).index)

   col1
1     2
3     4
4     5
5     6
6     7
7     8
8     9
9    10

For a range

df.drop(df.query('2 < col1 < 8').sample(frac=.4).index)

   col1
0     1
1     2
3     4
4     5
5     6
7     8
8     9
9    10

randomly remove rows from dataframe based on condition

Tags:

python

pandas

user1934212

1 Answers

piRSquared

Recent Activity

Donate For Us

randomly remove rows from dataframe based on condition

Tags:

python

pandas

user1934212

1 Answers

piRSquared

Related questions

Recent Activity

Donate For Us