I have a dataframe for which I want every column to be in string format. So I do this:
df = df.astype(str)
The problem is that in this way all the NaN entries are converted to string 'nan'. And isnull
returns false
. Is there a way to convert to string but keep empty entry as it is?
When you do astype(str)
, the dtype is always going to be object
, which is a dtype that includes mixed columns. Therefore, one thing you can do is convert it to object
using astype(str)
, as you were doing, but then replace the nan
with actual NaN
(which is inherently a float), allowing you to access it with methods such as isnull
:
df.astype(str).replace('nan',np.nan)
Example:
df = pd.DataFrame({'col1':['x',2,np.nan,'z']})
>>> df
col1
0 x
1 2
2 NaN
3 z
# Note the mixed str, int and null values:
>>> df.values
array([['x'],
[2],
[nan],
['z']], dtype=object)
df2 = df.astype(str).replace('nan',np.nan)
# Note that now you have only strings and null values:
>>> df2.values
array([['x'],
['2'],
[nan],
['z']], dtype=object)
Convert your null values to empty strings, then cast the dataframe as string type.
df.replace(np.nan, '').astype(str)
Note that you could test for 'nulls' via:
df.apply(lambda s: s.str.len() == 0)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With