Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace invalid values with None in Pandas DataFrame

Is there any method to replace values with None in Pandas in Python?

You can use df.replace('pre', 'post') and can replace a value with another, but this can't be done if you want to replace with None value, which if you try, you get a strange result.

So here's an example:

df = DataFrame(['-',3,2,5,1,-5,-1,'-',9]) df.replace('-', 0) 

which returns a successful result.

But,

df.replace('-', None) 

which returns a following result:

0 0   - // this isn't replaced 1   3 2   2 3   5 4   1 5  -5 6  -1 7  -1 // this is changed to `-1`... 8   9 

Why does such a strange result be returned?

Since I want to pour this data frame into MySQL database, I can't put NaN values into any element in my data frame and instead want to put None. Surely, you can first change '-' to NaN and then convert NaN to None, but I want to know why the dataframe acts in such a terrible way.

Tested on pandas 0.12.0 dev on Python 2.7 and OS X 10.8. Python is a pre-installed version on OS X and I installed pandas by using SciPy Superpack script, for your information.

like image 347
Blaszard Avatar asked Jun 13 '13 21:06

Blaszard


People also ask

How do you replace NaN with nothing pandas?

Use pandas. DataFrmae. fillna() to Replace NaN/Null values with an empty string. This replaces each NaN in pandas DataFrame with an empty string.

How do you get none values in pandas?

In order to check missing values in Pandas DataFrame, we use a function isnull() and notnull(). Both function help in checking whether a value is NaN or not. These function can also be used in Pandas Series in order to find null values in a series.


1 Answers

Actually in later versions of pandas this will give a TypeError:

df.replace('-', None) TypeError: If "to_replace" and "value" are both None then regex must be a mapping 

You can do it by passing either a list or a dictionary:

In [11]: df.replace('-', df.replace(['-'], [None]) # or .replace('-', {0: None}) Out[11]:       0 0  None 1     3 2     2 3     5 4     1 5    -5 6    -1 7  None 8     9 

But I recommend using NaNs rather than None:

In [12]: df.replace('-', np.nan) Out[12]:      0 0  NaN 1    3 2    2 3    5 4    1 5   -5 6   -1 7  NaN 8    9 
like image 165
Andy Hayden Avatar answered Sep 20 '22 23:09

Andy Hayden