Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert all columns in Pandas DataFrame to 'object' while ignoring NaN?

Tags:

python

pandas

I have a dataframe for which I want every column to be in string format. So I do this:

 df = df.astype(str)

The problem is that in this way all the NaN entries are converted to string 'nan'. And isnull returns false. Is there a way to convert to string but keep empty entry as it is?

like image 827
Catiger3331 Avatar asked Oct 09 '18 19:10

Catiger3331


2 Answers

When you do astype(str), the dtype is always going to be object, which is a dtype that includes mixed columns. Therefore, one thing you can do is convert it to object using astype(str), as you were doing, but then replace the nan with actual NaN (which is inherently a float), allowing you to access it with methods such as isnull:

df.astype(str).replace('nan',np.nan)

Example:

df = pd.DataFrame({'col1':['x',2,np.nan,'z']})
>>> df
  col1
0    x
1    2
2  NaN
3    z

# Note the mixed str, int and null values:
>>> df.values
array([['x'],
       [2],
       [nan],
       ['z']], dtype=object)

df2 = df.astype(str).replace('nan',np.nan)

# Note that now you have only strings and null values:
>>> df2.values
array([['x'],
       ['2'],
       [nan],
       ['z']], dtype=object)
like image 79
sacuL Avatar answered Nov 01 '22 06:11

sacuL


Convert your null values to empty strings, then cast the dataframe as string type.

df.replace(np.nan, '').astype(str)

Note that you could test for 'nulls' via:

df.apply(lambda s: s.str.len() == 0) 
like image 33
Alexander Avatar answered Nov 01 '22 05:11

Alexander