Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

replacing empty strings with NaN in Pandas

I have a pandas dataframe (that was created by importing a csv file). I want to replace blank values with NaN. Some of these blank values are empty and some contain a (variable number) of spaces '', ' ', ' ', etc.

Using the suggestion from this thread I have

df.replace(r'\s+', np.nan, regex=True, inplace = True)

which does replace all the strings that only contain spaces, but also replaces every string that has a space in it, which is not what I want.

How do I replace only strings with just spaces and empty strings?

like image 997
doctorer Avatar asked Nov 21 '16 02:11

doctorer


People also ask

How do I change NaN to empty in pandas?

Convert Nan to Empty String in PandasUse df. replace(np. nan,'',regex=True) method to replace all NaN values to an empty string in the Pandas DataFrame column.


2 Answers

Indicate it has to start with blank and end with blanks with ^ and $ :

df.replace(r'^\s*$', np.nan, regex=True, inplace = True)
like image 51
Zeugma Avatar answered Oct 19 '22 14:10

Zeugma


If you are reading a csv file and want to convert all empty strings to nan while reading the file itself then you can use the option

skipinitialspace=True

Example code

pd.read_csv('Sample.csv', skipinitialspace=True)

This will remove any white spaces that appear after the delimiters, Thus making all the empty strings as nan

From the documentation http://pandas.pydata.org/pandas-docs/stable/io.html

enter image description here

Note: This option will remove preceding white spaces even from valid data, if for any reason you want to retain the preceding white space then this option is not a good choice.

like image 5
Rajshekar Reddy Avatar answered Oct 19 '22 13:10

Rajshekar Reddy