Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas drop rows with value less than a given value

Tags:

python

pandas

I would like to delete rows that contain only values that are less than 10 and greater than 25. My sample dataframe will look like this:

a   b   c  
1   2   3  
4   5   16  
11  24  22  
26  50  65  

Expected Output:

a   b   c  
1   2   3  
4   5   16   
26  50  65  

So if the row contains any value less than 10 or greater than 25, then the row will stay in dataframe, otherwise, it needs to be dropped.

Is there any way I can achieve this with Pandas instead of iterating through all the rows?

like image 355
Jaswanth Kumar Avatar asked Dec 24 '22 17:12

Jaswanth Kumar


1 Answers

You can call apply and return the results to a new column called 'Keep'. You can then use this column to drop rows that you don't need.

import pandas as pd
l = [[1,2,3],[4,5,6],[11,24,22],[26,50,65]]
df = pd.DataFrame(l, columns = ['a','b','c']) #Set up sample dataFrame

df['keep'] = df.apply(lambda row: sum(any([(x < 10) or (x > 25) for x in row])), axis = 1)

The any() function returns a generator. Calling sum(generator) simply returns the sum of all the results stored in the generator.

Check this on how any() works. Apply function still iterates over all the rows like a for loop, but the code looks cleaner this way. I cannot think of a way to do this without iterating over all the rows.

Output:

    a   b   c  keep
0   1   2   3     1
1   4   5   6     1
2  11  24  22     0
3  26  50  65     1


df = df[df['keep'] == 1] #Drop unwanted rows
like image 53
Rakesh Adhikesavan Avatar answered Dec 30 '22 13:12

Rakesh Adhikesavan