Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

querying panda df to filter rows where a column is not Nan [duplicate]

I am new to python and using pandas.

I want to query a dataframe and filter the rows where one of the columns is not NaN.

I have tried:

a=dictionarydf.label.isnull()

but a is populated with true or false. Tried this

dictionarydf.query(dictionarydf.label.isnull())

but gave an error as I expected

sample data:

      reference_word         all_matching_words  label review
0           account             fees - account    NaN      N
1           account           mobile - account    NaN      N
2           account          monthly - account    NaN      N
3    administration  delivery - administration    NaN      N
4    administration      fund - administration    NaN      N
5           advisor             fees - advisor    NaN      N
6           advisor          optimum - advisor    NaN      N
7           advisor              sub - advisor    NaN      N
8             aichi           delivery - aichi    NaN      N
9             aichi               pref - aichi    NaN      N
10          airport              biz - airport    travel      N
11          airport              cfo - airport    travel      N
12          airport           cfomtg - airport    travel      N
13          airport          meeting - airport    travel      N
14          airport           summit - airport    travel      N
15          airport             taxi - airport    travel      N
16          airport            train - airport    travel      N
17          airport         transfer - airport    travel      N
18          airport             trip - airport    travel      N
19              ais                admin - ais    NaN      N
20              ais               alpine - ais    NaN      N
21              ais                 fund - ais    NaN      N
22       allegiance       custody - allegiance    NaN      N
23       allegiance          fees - allegiance    NaN      N
24            alpha               late - alpha    NaN      N
25            alpha               meal - alpha    NaN      N
26            alpha               taxi - alpha    NaN      N
27           alpine             admin - alpine    NaN      N
28           alpine               ais - alpine    NaN      N
29           alpine              fund - alpine    NaN      N

I want to filter the data where label is not NaN

expected output:

     reference_word         all_matching_words   label    review
0          airport              biz - airport    travel      N
1          airport              cfo - airport    travel      N
2          airport           cfomtg - airport    travel      N
3          airport          meeting - airport    travel      N
4          airport           summit - airport    travel      N
5          airport             taxi - airport    travel      N
6          airport            train - airport    travel      N
7          airport         transfer - airport    travel      N
8          airport             trip - airport    travel      N
like image 920
DileepGogula Avatar asked Sep 26 '16 05:09

DileepGogula


People also ask

How do you filter a pandas DataFrame based on null values of a column?

You can filter out rows with NAN value from pandas DataFrame column string, float, datetime e.t.c by using DataFrame. dropna() and DataFrame. notnull() methods. Python doesn't support Null hence any missing data is represented as None or NaN.

How do you select rows without null values in Python?

To display not null rows and columns in a python data frame we are going to use different methods as dropna(), notnull(), loc[]. dropna() : This function is used to remove rows and column which has missing values that are NaN values.


1 Answers

You can use dropna:

df = df.dropna(subset=['label'])

print (df)
   reference_word  all_matching_words   label review
10        airport       biz - airport  travel      N
11        airport       cfo - airport  travel      N
12        airport    cfomtg - airport  travel      N
13        airport   meeting - airport  travel      N
14        airport    summit - airport  travel      N
15        airport      taxi - airport  travel      N
16        airport     train - airport  travel      N
17        airport  transfer - airport  travel      N
18        airport      trip - airport  travel      N

Another solution - boolean indexing with notnull:

df = df[df.label.notnull()]

print (df)
   reference_word  all_matching_words   label review
10        airport       biz - airport  travel      N
11        airport       cfo - airport  travel      N
12        airport    cfomtg - airport  travel      N
13        airport   meeting - airport  travel      N
14        airport    summit - airport  travel      N
15        airport      taxi - airport  travel      N
16        airport     train - airport  travel      N
17        airport  transfer - airport  travel      N
18        airport      trip - airport  travel      N
like image 79
jezrael Avatar answered Oct 12 '22 11:10

jezrael