Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

"Drop random rows" from pandas dataframe

In a pandas dataframe, how can I drop a random subset of rows that obey a condition?

In other words, if I have a Pandas dataframe with a Label column, I'd like to drop 50% (or some other percentage) of rows where Label == 1, but keep all of the rest:

Label A     ->    Label A
0     1           0     1
0     2           0     2
0     3           0     3
1     10          1     11
1     11          1     12
1     12
1     13

I'd love to know the simplest and most pythonic/panda-ish way of doing this!


Edit: This question provides part of an answer, but it only talks about dropping rows by index, disregarding the row values. I'd still like to know how to drop only from rows that are labeled a certain way.

like image 755
NcAdams Avatar asked Dec 03 '22 11:12

NcAdams


1 Answers

Use the frac argument

df.sample(frac=.5)

If you define the amount you want to drop in a variable n

n = .5
df.sample(frac=1 - n)

To include the condition, use drop

df.drop(df.query('Label == 1').sample(frac=.5).index)

   Label   A
0      0   1
1      0   2
2      0   3
4      1  11
6      1  13
like image 182
piRSquared Avatar answered Dec 21 '22 23:12

piRSquared