Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas fillna: Output still has NaN values

Tags:

python

pandas

I am having a strange problem in Pandas. I have a Dataframe with several NaN values. I thought I could fill those NaN values using column means (that is, fill every NaN value with its column mean) but when I try the following

  col_means = mydf.apply(np.mean, 0)
  mydf = mydf.fillna(value=col_means)

I still see some NaN values. Why?

Is it because I have more NaN values in my original dataframe than entries in col_means? And what exactly is the difference between fill-by-column vs fill-by-row?

like image 618
Amelio Vazquez-Reina Avatar asked Aug 08 '13 13:08

Amelio Vazquez-Reina


People also ask

Does Fillna fill NaN?

The fillna() function is used to fill NA/NaN values using the specified method. The dataframe. replace() function in Pandas can be defined as a simple method used to replace a string, regex, list, dictionary etc.

How do I get rid of NaN in pandas?

By using dropna() method you can drop rows with NaN (Not a Number) and None values from pandas DataFrame. Note that by default it returns the copy of the DataFrame after removing rows. If you wanted to remove from the existing DataFrame, you should use inplace=True .

Does Dropna remove NaN?

You can remove missing values ( NaN ) from pandas. DataFrame , Series with dropna() . This article describes the following contents. If you want to extract rows and columns with missing values, see the following article.

How do I get rid of NaN values?

To remove NaN from a list using Python, the easiest way is to use the isnan() function from the Python math module and list comprehension. You can also use the Python filter() function. The Python numpy module also provides an isnan() function that we can use to check if a value is NaN.


1 Answers

You can just fillna with the df.mean() Series (which is dict-like):

In [11]: df = pd.DataFrame([[1, np.nan], [np.nan, 4], [5, 6]])

In [12]: df
Out[12]:
    0   1
0   1 NaN
1 NaN   4
2   5   6

In [13]: df.fillna(df.mean())
Out[13]:
   0  1
0  1  5
1  3  4
2  5  6

Note: that df.mean() is the row-wise mean, which gives the fill values:

In [14]: df.mean()
Out[14]:
0    3
1    5
dtype: float64

Note: if df.mean() has some NaN values then these will be used in the DataFrame's fillna, perhaps you want to use a fillna on this Series i.e.

df.mean().fillna(0)
df.fillna(df.mean().fillna(0))
like image 111
Andy Hayden Avatar answered Sep 18 '22 21:09

Andy Hayden