Please explain how NaN's are treated in pandas because the following logic seems "broken" to me, I tried various ways (shown below) to drop the empty values.
My dataframe, which I load from a CSV file using read.csv
, has a column comments
, which is empty most of the time.
The column marked_results.comments
looks like this; all the rest of the column is NaN, so pandas loads empty entries as NaNs, so far so good:
0 VP
1 VP
2 VP
3 TEST
4 NaN
5 NaN
....
Now I try to drop those entries, only this works:
marked_results.comments.isnull()
All these don't work:
marked_results.comments.dropna()
only gives the same column, nothing gets dropped, confusing.marked_results.comments == NaN
only gives a series of all False
s. Nothing was NaNs... confusing.marked_results.comments == nan
I also tried:
comments_values = marked_results.comments.unique()
array(['VP', 'TEST', nan], dtype=object)
# Ah, gotya! so now ive tried:
marked_results.comments == comments_values[2]
# but still all the results are Falses!!!
Use dropna() function to drop rows with NaN / None values in pandas DataFrame. Python doesn't support Null hence any missing data is represented as None or NaN. NaN stands for Not A Number and is one of the common ways to represent the missing value in the data.
By using pandas. DataFrame. dropna() method you can drop columns with Nan (Not a Number) or None values from DataFrame. Note that by default it returns the copy of the DataFrame after removing columns.
by-default pandas consider #N/A, -NaN, -n/a, N/A, NULL etc as NaN value. let's see the example for better understanding. so this is our dataframe it has three column names, class, and total marks. now import the dataframe in python pandas.
You should use isnull
and notnull
to test for NaN (these are more robust using pandas dtypes than numpy), see "values considered missing" in the docs.
Using the Series method dropna
on a column won't affect the original dataframe, but do what you want:
In [11]: df
Out[11]:
comments
0 VP
1 VP
2 VP
3 TEST
4 NaN
5 NaN
In [12]: df.comments.dropna()
Out[12]:
0 VP
1 VP
2 VP
3 TEST
Name: comments, dtype: object
The dropna
DataFrame method has a subset argument (to drop rows which have NaNs in specific columns):
In [13]: df.dropna(subset=['comments'])
Out[13]:
comments
0 VP
1 VP
2 VP
3 TEST
In [14]: df = df.dropna(subset=['comments'])
You need to test NaN
with math.isnan()
function (Or numpy.isnan
). NaNs cannot be checked with the equality operator.
>>> a = float('NaN')
>>> a
nan
>>> a == 'NaN'
False
>>> isnan(a)
True
>>> a == float('NaN')
False
Help Function ->
isnan(...)
isnan(x) -> bool
Check if float x is not a number (NaN).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With