I am trying to fill all the nans in a dataframe containing multiple columns and several rows. I am using this to train a multi variate ML-model so I want to fill the nans for each column with the median. Just to test the median function I did this:
training_df.loc[[0]] = np.nan # Sets first row to nan
print(training_df.isnull().values.any()) # Prints true because we just inserted nans
test = training_df.fillna(training_df.median()) # Fillna with median
print(test.isnull().values.any()) # Check afterwards
But when I do this nothing happens, the print of the last row still returns True. If I try to change to use the median function like this instead:
training_df.fillna(training_df.median(), inplace=True)
Nothing happens as well. If I do this:
training_df = training_df.fillna(training_df.median(), inplace=True)
Training_df becomes none. How can I solve this?
As @thesilkworm suggested, convert your series to numeric first. Below is a minimal example:
import pandas as pd, numpy as np
df = pd.DataFrame([[np.nan, np.nan, np.nan],
[5, 1, 2, 'hello'],
[1, 4, 3, 4],
[9, 8, 7, 6]], dtype=object)
df = df.fillna(df.median()) # fails
df[df.columns] = df[df.columns].apply(pd.to_numeric, errors='coerce')
df = df.fillna(df.median()) # works
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With