Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

delete rows based on a condition in pandas

Tags:

python

pandas

I have the below dataframe

In [62]: df
Out[62]:
            coverage   name  reports  year
Cochice           45  Jason        4  2012
Pima             214  Molly       24  2012
Santa Cruz       212   Tina       31  2013
Maricopa          72   Jake        2  2014
Yuma              85    Amy        3  2014

Basically i can filter the rows as below

df[df["coverage"] > 30

and i can drop/delete a single row as below

df.drop(['Cochice', 'Pima'])

But i want to delete a certain number of rows based on a condition, how can i do so?

like image 576
Shiva Krishna Bavandla Avatar asked Jan 24 '17 16:01

Shiva Krishna Bavandla


2 Answers

The best is boolean indexing but need invert condition - get all values equal and higher as 72:

print (df[df["coverage"] >= 72])
            coverage   name  reports  year
Pima             214  Molly       24  2012
Santa Cruz       212   Tina       31  2013
Maricopa          72   Jake        2  2014
Yuma              85    Amy        3  2014

It is same as ge function:

print (df[df["coverage"].ge(72)])
            coverage   name  reports  year
Pima             214  Molly       24  2012
Santa Cruz       212   Tina       31  2013
Maricopa          72   Jake        2  2014
Yuma              85    Amy        3  2014

Another possible solution is invert mask by ~:

print (df["coverage"] < 72)
Cochice        True
Pima          False
Santa Cruz    False
Maricopa      False
Yuma          False
Name: coverage, dtype: bool

print (~(df["coverage"] < 72))
Cochice       False
Pima           True
Santa Cruz     True
Maricopa       True
Yuma           True
Name: coverage, dtype: bool


print (df[~(df["coverage"] < 72)])
            coverage   name  reports  year
Pima             214  Molly       24  2012
Santa Cruz       212   Tina       31  2013
Maricopa          72   Jake        2  2014
Yuma              85    Amy        3  2014
like image 153
jezrael Avatar answered Dec 14 '22 23:12

jezrael


we can use pandas.query() functionality as well

import pandas as pd 

dict_ = {'coverage':[45,214,212,72,85], 'name': ['jason','Molly','Tina','Jake','Amy']}
df  = pd.DataFrame(dict_)

print(df.query('coverage > 72'))

enter image description here

like image 34
qaiser Avatar answered Dec 15 '22 01:12

qaiser