I would like to delete rows that contain only values that are less than 10 and greater than 25. My sample dataframe will look like this:
a b c
1 2 3
4 5 16
11 24 22
26 50 65
Expected Output:
a b c
1 2 3
4 5 16
26 50 65
So if the row contains any value less than 10 or greater than 25, then the row will stay in dataframe, otherwise, it needs to be dropped.
Is there any way I can achieve this with Pandas instead of iterating through all the rows?
You can call apply and return the results to a new column called 'Keep'. You can then use this column to drop rows that you don't need.
import pandas as pd
l = [[1,2,3],[4,5,6],[11,24,22],[26,50,65]]
df = pd.DataFrame(l, columns = ['a','b','c']) #Set up sample dataFrame
df['keep'] = df.apply(lambda row: sum(any([(x < 10) or (x > 25) for x in row])), axis = 1)
The any()
function returns a generator. Calling sum(generator)
simply returns the sum of all the results stored in the generator.
Check this on how any()
works.
Apply function still iterates over all the rows like a for loop, but the code looks cleaner this way. I cannot think of a way to do this without iterating over all the rows.
Output:
a b c keep
0 1 2 3 1
1 4 5 6 1
2 11 24 22 0
3 26 50 65 1
df = df[df['keep'] == 1] #Drop unwanted rows
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With