"Drop random rows" from pandas dataframe

Question

In a pandas dataframe, how can I drop a random subset of rows that obey a condition?

In other words, if I have a Pandas dataframe with a Label column, I'd like to drop 50% (or some other percentage) of rows where Label == 1, but keep all of the rest:

Label A     ->    Label A
0     1           0     1
0     2           0     2
0     3           0     3
1     10          1     11
1     11          1     12
1     12
1     13

I'd love to know the simplest and most pythonic/panda-ish way of doing this!

Edit: This question provides part of an answer, but it only talks about dropping rows by index, disregarding the row values. I'd still like to know how to drop only from rows that are labeled a certain way.

piRSquared · Accepted Answer

Use the frac argument

df.sample(frac=.5)

If you define the amount you want to drop in a variable n

n = .5
df.sample(frac=1 - n)

To include the condition, use drop

df.drop(df.query('Label == 1').sample(frac=.5).index)

   Label   A
0      0   1
1      0   2
2      0   3
4      1  11
6      1  13

"Drop random rows" from pandas dataframe

Tags:

python

pandas

dataframe

NcAdams

1 Answers

piRSquared

Recent Activity

Donate For Us

"Drop random rows" from pandas dataframe

Tags:

python

pandas

dataframe

NcAdams

1 Answers

piRSquared

Related questions

Recent Activity

Donate For Us