given a dataframe with numerical values in a specific column, I want to randomly remove a certain percentage of the rows for which the value in that specific column lies within a certain range.
For example given the following dataframe:
df = pd.DataFrame({'col1': [1,2,3,4,5,6,7,8,9,10]})
df
col1
0 1
1 2
2 3
3 4
4 5
5 6
6 7
7 8
8 9
9 10
2/5 of the rows where col1 is below 6 should be removed randomly.
Whats the most concise way to do that?
use sample
+ drop
df.drop(df.query('col1 < 6').sample(frac=.4).index)
col1
1 2
3 4
4 5
5 6
6 7
7 8
8 9
9 10
For a range
df.drop(df.query('2 < col1 < 8').sample(frac=.4).index)
col1
0 1
1 2
3 4
4 5
5 6
7 8
8 9
9 10
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With