Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Value filter in pandas dataframe keeping NaN

I am trying to filter data from a dataframe which are less than a certain value. If there is no NaN then its working fine. But when there is a nan then it is ignoring the NaN value. I want to include all the time its doesn't matter its less than or bigger than the comparing value.

import pandas as pd
import numpy as np

df = pd.DataFrame(
    {
        'index': [1, 2, 3,  4,  5,  6,   7,  8, 9],
        'value': [5, 6, 7, np.nan, 9, 3, 11, 34, 78]
    }
)

df_chunked = df[(df['index'] >= 1) & (df['index'] <= 5)]

print('df_chunked')
print(df_chunked)

df_result = df_chunked[(df_chunked['value'] < 10)]
# df_result = df_chunked[(df_chunked['value'] < 10) | (df_chunked['value'] == np.isnan(df_chunked['value']))]

print('df_result')
print(df_result)

enter image description here

In the above result 5,6,7,9 is showing. but i want also the nan there. I tried with

df_result = df_chunked[(df_chunked['value'] < 10) | (df_chunked['value'] == np.isnan(df_chunked['value']))]

But it is not working.

How can I do this?

like image 579
BC Smith Avatar asked Feb 06 '20 09:02

BC Smith


People also ask

How do you filter DataFrame based on NaN values?

Filter out NAN Rows Using DataFrame. Filter out NAN rows (Data selection) by using DataFrame. dropna() method. The dropna() function is also possible to drop rows with NaN values df. dropna(thresh=2) it will drop all rows where there are at least two non- NaN .

Why am I getting NaN in pandas?

In applied data science, you will usually have missing data. For example, an industrial application with sensors will have sensor data that is missing on certain days. You have a couple of alternatives to work with missing data.

How pandas handle DataFrame NaN values?

In order to check missing values in Pandas DataFrame, we use a function isnull() and notnull(). Both function help in checking whether a value is NaN or not. These function can also be used in Pandas Series in order to find null values in a series.


2 Answers

Use not operator: ~

df_chunked[~(df_chunked['value'].ge(10))]
#df_chunked[~(df_chunked['value']>=10)] #greater or equal(the same)

   index  value
0      1    5.0
1      2    6.0
2      3    7.0
3      4    NaN
4      5    9.0

why?

Because the logical operations simply ignore NaN values and take it as False, always as you can see in the following data frame, then if you want to avoid using series.isna ( avoid unnecessary additional code) and simplify your code simply use the inverse logic with ~

print(df.assign(greater_than_5 = df['value'].gt(5),
          not_greater_than_5 = df['value'].le(5)))


   index  value  greater_than_5  not_greater_than_5
0      1    5.0           False                True
1      2    6.0            True               False
2      3    7.0            True               False
3      4    NaN           False               False
4      5    9.0            True               False
5      6    3.0           False                True
6      7   11.0            True               False
7      8   34.0            True               False
8      9   78.0            True               False
like image 197
ansev Avatar answered Oct 06 '22 23:10

ansev


Try:

df_result = df_chunked[(df_chunked['value'] < 10) | (df_chunked['value'].isna())]
df_result 
   index  value
0      1    5.0
1      2    6.0
2      3    7.0
3      4    NaN
4      5    9.0
like image 2
luigigi Avatar answered Oct 06 '22 22:10

luigigi