I am new to python and using pandas.
I want to query a dataframe and filter the rows where one of the columns is not NaN
.
I have tried:
a=dictionarydf.label.isnull()
but a is populated with true
or false
.
Tried this
dictionarydf.query(dictionarydf.label.isnull())
but gave an error as I expected
sample data:
reference_word all_matching_words label review
0 account fees - account NaN N
1 account mobile - account NaN N
2 account monthly - account NaN N
3 administration delivery - administration NaN N
4 administration fund - administration NaN N
5 advisor fees - advisor NaN N
6 advisor optimum - advisor NaN N
7 advisor sub - advisor NaN N
8 aichi delivery - aichi NaN N
9 aichi pref - aichi NaN N
10 airport biz - airport travel N
11 airport cfo - airport travel N
12 airport cfomtg - airport travel N
13 airport meeting - airport travel N
14 airport summit - airport travel N
15 airport taxi - airport travel N
16 airport train - airport travel N
17 airport transfer - airport travel N
18 airport trip - airport travel N
19 ais admin - ais NaN N
20 ais alpine - ais NaN N
21 ais fund - ais NaN N
22 allegiance custody - allegiance NaN N
23 allegiance fees - allegiance NaN N
24 alpha late - alpha NaN N
25 alpha meal - alpha NaN N
26 alpha taxi - alpha NaN N
27 alpine admin - alpine NaN N
28 alpine ais - alpine NaN N
29 alpine fund - alpine NaN N
I want to filter the data where label is not NaN
expected output:
reference_word all_matching_words label review
0 airport biz - airport travel N
1 airport cfo - airport travel N
2 airport cfomtg - airport travel N
3 airport meeting - airport travel N
4 airport summit - airport travel N
5 airport taxi - airport travel N
6 airport train - airport travel N
7 airport transfer - airport travel N
8 airport trip - airport travel N
You can filter out rows with NAN value from pandas DataFrame column string, float, datetime e.t.c by using DataFrame. dropna() and DataFrame. notnull() methods. Python doesn't support Null hence any missing data is represented as None or NaN.
To display not null rows and columns in a python data frame we are going to use different methods as dropna(), notnull(), loc[]. dropna() : This function is used to remove rows and column which has missing values that are NaN values.
You can use dropna
:
df = df.dropna(subset=['label'])
print (df)
reference_word all_matching_words label review
10 airport biz - airport travel N
11 airport cfo - airport travel N
12 airport cfomtg - airport travel N
13 airport meeting - airport travel N
14 airport summit - airport travel N
15 airport taxi - airport travel N
16 airport train - airport travel N
17 airport transfer - airport travel N
18 airport trip - airport travel N
Another solution - boolean indexing
with notnull
:
df = df[df.label.notnull()]
print (df)
reference_word all_matching_words label review
10 airport biz - airport travel N
11 airport cfo - airport travel N
12 airport cfomtg - airport travel N
13 airport meeting - airport travel N
14 airport summit - airport travel N
15 airport taxi - airport travel N
16 airport train - airport travel N
17 airport transfer - airport travel N
18 airport trip - airport travel N
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With