Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert column to string, retaining NaN (as None or blank)

Tags:

python

pandas

I would like to format a bunch of numbers in a list. The easiest way to do this is to convert it first to a bunch of strings. Here's an example of how I'm doing this:

df[col_name].astype('str').tolist()

However, the issue with this is I get values such as:

['12.19', '13.99', '1.00', 'nan', '9.00']

Is there a way I can return the 'nan' values as either None or an empty string, for example:

['12.19', '13.99', '1.00', None, '9.00']

Or:

['12.19', '13.99', '1.00', '', '9.00']

How would I do these two?

like image 262
David542 Avatar asked Dec 23 '18 05:12

David542


2 Answers

You can try like this.

1st way:

>>> df[col_name].apply(lambda v: str(v) if str(v) != 'nan' else None).tolist()
['12.19', '13.99', '1.00', None, '9.00']
>>>
>>> df[col_name].apply(lambda v: str(v) if str(v) != 'nan' else '').tolist()
['12.19', '13.99', '1.00', '', '9.00']
>>>

2nd way:

>>> df[col_name].apply(lambda v: str(v) if not pd.isnull(v) else None).tolist()
['12.19', '13.99', '1.00', None, '9.00']
>>>
>>> df[col_name].apply(lambda v: str(v) if not pd.isnull(v) else '').tolist()
['12.19', '13.99', '1.00', '', '9.00']
>>>

Here is the detailed explanation.

>>> import pandas as pd
>>> import numpy as np
>>>
>>> df = pd.DataFrame({
... "fullname": ['P Y', 'P T', 'T Y', 'N A', 'P Z'],
... "age": [36, 80, 25, 8, 34],
... "salary": ['12.19', '13.99', '1.00', np.nan, '9.00']
... })
>>>
>>> df
  fullname  age salary
0      P Y   36  12.19
1      P T   80  13.99
2      T Y   25   1.00
3      N A    8    NaN
4      P Z   34   9.00
>>>
>>> # PROBLEM
...
>>> col_name = "salary"
>>> df[col_name].astype("str").tolist()
['12.19', '13.99', '1.00', 'nan', '9.00']
>>>
>>> # SOLUTION
...
>>> df[col_name].apply(lambda v: str(v) if str(v) != 'nan' else None)
0    12.19
1    13.99
2     1.00
3     None
4     9.00
Name: salary, dtype: object
>>>
>>> df[col_name].apply(lambda v: str(v) if str(v) != 'nan' else '')
0    12.19
1    13.99
2     1.00
3
4     9.00
Name: salary, dtype: object
>>>
>>> df[col_name].apply(lambda v: str(v) if str(v) != 'nan' else None).tolist()
['12.19', '13.99', '1.00', None, '9.00']
>>>
>>> df[col_name].apply(lambda v: str(v) if str(v) != 'nan' else '').tolist()
['12.19', '13.99', '1.00', '', '9.00']
>>>
>>> df[col_name].apply(lambda v: str(v) if not pd.isnull(v) else None).tolist()
['12.19', '13.99', '1.00', None, '9.00']
>>>
>>> df[col_name].apply(lambda v: str(v) if not pd.isnull(v) else '').tolist()
['12.19', '13.99', '1.00', '', '9.00']
>>>
like image 165
hygull Avatar answered Sep 28 '22 15:09

hygull


try use fillna()

df[col_name].fillna('').astype('str').tolist()
like image 26
Hainan Zhao Avatar answered Sep 28 '22 14:09

Hainan Zhao