Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace empty list values in Pandas DataFrame with NaN

I know that similar questions have been asked before, but I literarily tried every possible solution listed here and none of them worked.

I am having a dataframe which consists of dates, strings, empty values, and empty list values. It is very huge, 8 million rows.

I want to replace all of the empty list values - so only cells that contain only [], nothing else with NaN. Nothing seems to work.

I tried this:

df = df.apply(lambda y: np.nan if (type(y) == list and len(y) == 0) else y)

as advised similarly in this question replace empty list with NaN in pandas dataframe but it doesn't change anything in my dataframe.

Any help would be appreciated.

like image 727
SLack A Avatar asked Sep 18 '25 03:09

SLack A


2 Answers

Just to assume the OP wants to convert empty list, the string '[]' and the object '[]' to na, below is a solution.

Setup

#borrowed from piRSquared's answer.
df = pd.DataFrame([
        [1, 'hello', np.nan, None, 3.14],
        ['2017-06-30', 2, 'a', 'b', []],
        [pd.to_datetime('2016-08-14'), 'x', '[]', 'z', 'w']
    ])

df
Out[1062]: 
                     0      1    2     3     4
0                    1  hello  NaN  None  3.14
1           2017-06-30      2    a     b    []
2  2016-08-14 00:00:00      x   []     z     w

Solution:

#convert all elements to string first, and then compare with '[]'. Finally use mask function to mark '[]' as na
df.mask(df.applymap(str).eq('[]'))
Out[1063]: 
                     0      1    2     3     4
0                    1  hello  NaN  None  3.14
1           2017-06-30      2    a     b   NaN
2  2016-08-14 00:00:00      x  NaN     z     w
like image 93
Allen Avatar answered Sep 21 '25 04:09

Allen


I'm going to make the assumption that you want to mask actual empty lists.

  • pd.DataFrame.mask will turn cells that have corresponding True values to np.nan
  • I want to find actual list values. So I'll use df.applymap(type) to get the type in every cell and see if it is equal to list
  • I know that [] evaluates to False in a boolean context, so I'll use df.astype(bool) to see.
  • I'll end up masking those cells that are both list type and evaluate to False

Consider the dataframe df

df = pd.DataFrame([
        [1, 'hello', np.nan, None, 3.14],
        ['2017-06-30', 2, 'a', 'b', []],
        [pd.to_datetime('2016-08-14'), 'x', '[]', 'z', 'w']
    ])

df

                     0      1    2     3     4
0                    1  hello  NaN  None  3.14
1           2017-06-30      2    a     b    []
2  2016-08-14 00:00:00      x   []     z     w

Solution

df.mask(df.applymap(type).eq(list) & ~df.astype(bool))

                     0      1    2     3     4
0                    1  hello  NaN  None  3.14
1           2017-06-30      2    a     b   NaN
2  2016-08-14 00:00:00      x   []     z     w
like image 29
piRSquared Avatar answered Sep 21 '25 04:09

piRSquared