I am trying to filter data from a dataframe which are less than a certain value. If there is no NaN then its working fine. But when there is a nan then it is ignoring the NaN value. I want to include all the time its doesn't matter its less than or bigger than the comparing value.
import pandas as pd
import numpy as np
df = pd.DataFrame(
{
'index': [1, 2, 3, 4, 5, 6, 7, 8, 9],
'value': [5, 6, 7, np.nan, 9, 3, 11, 34, 78]
}
)
df_chunked = df[(df['index'] >= 1) & (df['index'] <= 5)]
print('df_chunked')
print(df_chunked)
df_result = df_chunked[(df_chunked['value'] < 10)]
# df_result = df_chunked[(df_chunked['value'] < 10) | (df_chunked['value'] == np.isnan(df_chunked['value']))]
print('df_result')
print(df_result)
In the above result 5,6,7,9 is showing. but i want also the nan there. I tried with
df_result = df_chunked[(df_chunked['value'] < 10) | (df_chunked['value'] == np.isnan(df_chunked['value']))]
But it is not working.
How can I do this?
Filter out NAN Rows Using DataFrame. Filter out NAN rows (Data selection) by using DataFrame. dropna() method. The dropna() function is also possible to drop rows with NaN values df. dropna(thresh=2) it will drop all rows where there are at least two non- NaN .
In applied data science, you will usually have missing data. For example, an industrial application with sensors will have sensor data that is missing on certain days. You have a couple of alternatives to work with missing data.
In order to check missing values in Pandas DataFrame, we use a function isnull() and notnull(). Both function help in checking whether a value is NaN or not. These function can also be used in Pandas Series in order to find null values in a series.
Use not operator: ~
df_chunked[~(df_chunked['value'].ge(10))]
#df_chunked[~(df_chunked['value']>=10)] #greater or equal(the same)
index value
0 1 5.0
1 2 6.0
2 3 7.0
3 4 NaN
4 5 9.0
why?
Because the logical operations simply ignore NaN
values and take it as False
, always as you can see in the following data frame, then if you want to avoid using series.isna
(
avoid unnecessary additional code) and simplify your code simply use the inverse logic with ~
print(df.assign(greater_than_5 = df['value'].gt(5),
not_greater_than_5 = df['value'].le(5)))
index value greater_than_5 not_greater_than_5
0 1 5.0 False True
1 2 6.0 True False
2 3 7.0 True False
3 4 NaN False False
4 5 9.0 True False
5 6 3.0 False True
6 7 11.0 True False
7 8 34.0 True False
8 9 78.0 True False
Try:
df_result = df_chunked[(df_chunked['value'] < 10) | (df_chunked['value'].isna())]
df_result
index value
0 1 5.0
1 2 6.0
2 3 7.0
3 4 NaN
4 5 9.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With