Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to delete specific number of random rows in Pandas dataframe based on condition?

Tags:

python

pandas

I want to delete specific 'n' number of rows from a dataframe, where the rows to be deleted are chosen randomly. Also, it must select the rows based on a condition on particular column values.

For example, my dataframe is as below:

C1    C2    C3
1     0     a
2     1     b
3     0     c
4     0     d
5     0     e
6     1     f
7     1     g
8     1     h
9     0     i

Now, I want to remove n=2 rows randomly, that has a condition where C2==1.

The resultant frame can be as below:

C1    C2    C3
1     0     a
3     0     c
4     0     d
5     0     e
6     1     f
8     1     h
9     0     i

or

C1    C2    C3
1     0     a
2     1     b
3     0     c
4     0     d
5     0     e
7     1     g
9     0     i

or maybe other possibles too. The question here dows shows to remove 'n' sentences randomly, but it doesn't include providding the condition.

like image 959
Ashwin Geet D'Sa Avatar asked Jan 20 '26 06:01

Ashwin Geet D'Sa


1 Answers

Filter rows by boolean indexing with DataFrame.sample for random rows, last use drop:

N = 2
df1 = df.drop(df[df['C2'].eq(1)].sample(N).index)
print (df1)
   C1  C2 C3
0   1   0  a
1   2   1  b
2   3   0  c
3   4   0  d
4   5   0  e
6   7   1  g
8   9   0  i

Or use np.random.choice for random index values:

df = df.drop(np.random.choice(df.index[df['C2'].eq(1)], N))
like image 176
jezrael Avatar answered Jan 23 '26 20:01

jezrael