I have a data set in which there is a column known as 'Native Country' which contain around 30000 records. Some are missing represented by NaN
so I thought to fill it with mode()
value. I wrote something like this:
data['Native Country'].fillna(data['Native Country'].mode(), inplace=True)
However when I do a count of missing values:
for col_name in data.columns: print ("column:",col_name,".Missing:",sum(data[col_name].isnull()))
It is still coming up with the same number of NaN
values for the column Native Country.
The fillna() method replaces the NULL values with a specified value. The fillna() method returns a new DataFrame object unless the inplace parameter is set to True , in that case the fillna() method does the replacing in the original DataFrame instead.
You can use central tendency measures such as mean, median or mode of the numeric feature column to replace or impute missing values. You can use mean value to replace the missing values in case the data distribution is symmetric. Consider using median or mode with skewed data distribution.
Just call first element of series:
data['Native Country'].fillna(data['Native Country'].mode()[0], inplace=True)
or you can do the same with assisgnment:
data['Native Country'] = data['Native Country'].fillna(data['Native Country'].mode()[0])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With