Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Pandas Fillna Median not working

I am trying to fill all the nans in a dataframe containing multiple columns and several rows. I am using this to train a multi variate ML-model so I want to fill the nans for each column with the median. Just to test the median function I did this:

training_df.loc[[0]] = np.nan # Sets first row to nan
print(training_df.isnull().values.any()) # Prints true because we just inserted nans
test = training_df.fillna(training_df.median()) # Fillna with median
print(test.isnull().values.any()) # Check afterwards

But when I do this nothing happens, the print of the last row still returns True. If I try to change to use the median function like this instead:

training_df.fillna(training_df.median(), inplace=True)

Nothing happens as well. If I do this:

training_df = training_df.fillna(training_df.median(), inplace=True)

Training_df becomes none. How can I solve this?

like image 998
danielo Avatar asked Mar 06 '18 09:03

danielo


1 Answers

As @thesilkworm suggested, convert your series to numeric first. Below is a minimal example:

import pandas as pd, numpy as np

df = pd.DataFrame([[np.nan, np.nan, np.nan],
                   [5, 1, 2, 'hello'],
                   [1, 4, 3, 4],
                   [9, 8, 7, 6]], dtype=object)

df = df.fillna(df.median())  # fails

df[df.columns] = df[df.columns].apply(pd.to_numeric, errors='coerce')

df = df.fillna(df.median())  # works
like image 134
jpp Avatar answered Oct 12 '22 22:10

jpp