I'm trying to replace some NaN values in my data with an empty list []. However the list is represented as a str and doesn't allow me to properly apply the len() function. is there anyway to replace a NaN value with an actual empty list in pandas?
In [28]: d = pd.DataFrame({'x' : [[1,2,3], [1,2], np.NaN, np.NaN], 'y' : [1,2,3,4]}) In [29]: d Out[29]: x y 0 [1, 2, 3] 1 1 [1, 2] 2 2 NaN 3 3 NaN 4 In [32]: d.x.replace(np.NaN, '[]', inplace=True) In [33]: d Out[33]: x y 0 [1, 2, 3] 1 1 [1, 2] 2 2 [] 3 3 [] 4 In [34]: d.x.apply(len) Out[34]: 0 3 1 2 2 2 3 2 Name: x, dtype: int64
Convert Nan to Empty String in Pandas Use df. replace(np. nan,'',regex=True) method to replace all NaN values to an empty string in the Pandas DataFrame column.
Just use [[]]*s. isna(). sum() and you'll be back in business.
This works using isnull
and loc
to mask the series:
In [90]: d.loc[d.isnull()] = d.loc[d.isnull()].apply(lambda x: []) d Out[90]: 0 [1, 2, 3] 1 [1, 2] 2 [] 3 [] dtype: object In [91]: d.apply(len) Out[91]: 0 3 1 2 2 0 3 0 dtype: int64
You have to do this using apply
in order for the list object to not be interpreted as an array to assign back to the df which will try to align the shape back to the original series
EDIT
Using your updated sample the following works:
In [100]: d.loc[d['x'].isnull(),['x']] = d.loc[d['x'].isnull(),'x'].apply(lambda x: []) d Out[100]: x y 0 [1, 2, 3] 1 1 [1, 2] 2 2 [] 3 3 [] 4 In [102]: d['x'].apply(len) Out[102]: 0 3 1 2 2 0 3 0 Name: x, dtype: int64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With