I have a dataframe for which I want every column to be in string format. So I do this:
 df = df.astype(str)
The problem is that in this way all the NaN entries are converted to string 'nan'. And isnull returns false. Is there a way to convert to string but keep empty entry as it is?
When you do astype(str), the dtype is always going to be object, which is a dtype that includes mixed columns. Therefore, one thing you can do is convert it to object using astype(str), as you were doing, but then replace the nan with actual NaN (which is inherently a float), allowing you to access it with methods such as isnull:
df.astype(str).replace('nan',np.nan)
Example:
df = pd.DataFrame({'col1':['x',2,np.nan,'z']})
>>> df
  col1
0    x
1    2
2  NaN
3    z
# Note the mixed str, int and null values:
>>> df.values
array([['x'],
       [2],
       [nan],
       ['z']], dtype=object)
df2 = df.astype(str).replace('nan',np.nan)
# Note that now you have only strings and null values:
>>> df2.values
array([['x'],
       ['2'],
       [nan],
       ['z']], dtype=object)
                        Convert your null values to empty strings, then cast the dataframe as string type.
df.replace(np.nan, '').astype(str)
Note that you could test for 'nulls' via:
df.apply(lambda s: s.str.len() == 0) 
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With