Can't drop NAN with dropna in pandas

Question

I import pandas as pd and run the code below and get the following result

Code:

traindataset = pd.read_csv('/Users/train.csv')
print traindataset.dtypes
print traindataset.shape
print traindataset.iloc[25,3]
traindataset.dropna(how='any')
print traindataset.iloc[25,3]
print traindataset.shape

Output

TripType                   int64  
VisitNumber                int64  
Weekday                   object  
Upc                      float64  
ScanCount                  int64  
DepartmentDescription     object  
FinelineNumber           float64  
dtype: object

(647054, 7)

nan  
nan

(647054, 7) 
[Finished in 2.2s]

From the result, the dropna line doesn't work because the row number doesn't change and there is still NAN in the dataframe. How that comes? I am craaaazy right now.

Robert Forderer · Accepted Answer

This is my first post. I just spent a few hours debugging this exact issue and I would like to share how I fixed this issue.

I was converting my entire dataframe to a string and then placing that value back into the dataframe using similar code to what is displayed below: (please note, the code below will only convert the value to a string)

row_counter = 0
for ind, row in dataf.iterrows():
    cell_value = str(row['column_header'])
    dataf.loc[row_counter, 'column_header'] = cell_value
    row_counter += 1

After converting the entire dataframe to a string, I then used the dropna() function. The values that were previously NaN (considered a null value by pandas) were converted to the string 'nan'.

In conclusion, drop blank values FIRST, before you start manipulating data in the CSV and converting its data type.

BrenBarn · Answer

You need to read the documentation (emphasis added):

Return object with labels on given axis omitted

dropna returns a new DataFrame. If you want it to modify the existing DataFrame, all you have to do is read further in the documentation:

inplace : boolean, default False

If True, do operation inplace and return None.

So to modify it in place, do traindataset.dropna(how='any', inplace=True).

Can't drop NAN with dropna in pandas

Tags:

python

pandas

dataframe

missing-data

fangh

Video Answer

2 Answers

Robert Forderer

BrenBarn

Recent Activity

Donate For Us

Can't drop NAN with dropna in pandas

Tags:

python

pandas

dataframe

missing-data

fangh

Video Answer

2 Answers

Robert Forderer

BrenBarn

Related questions

Recent Activity

Donate For Us