Concatenating columns of lists containing NaNs in a dataframe

Question

I have a pandas df with two columns having either lists or NaN values. There are no rows having NaN in both columns. I want to create a third column that merges the values of the other two columns in the following way:-

if row df.a is NaN -> df.c = df.b

if row df.b is Nan -> df.c = df.a

else df.c = df.a + df.b

Input:-

df
                                 a                                    b
0   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]                                  NaN   
1   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]                                  NaN   
2   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]                                  NaN   
3   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]                                  NaN   
4   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]                                  NaN   
5   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]                                  NaN   
6   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]                                  NaN   
7   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]                                  NaN   
8   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]  [5, 6, 7, 8, 9, 10, 11, 12, 13, 14]   
9   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]  [5, 6, 7, 8, 9, 10, 11, 12, 13, 14]   
10                             NaN  [5, 6, 7, 8, 9, 10, 11, 12, 13, 14]   
11                             NaN  [5, 6, 7, 8, 9, 10, 11, 12, 13, 14]

output:

df.c

0   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
1   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
2   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
3   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
4   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
5   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
6   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
7   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]                                     
8   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]   
9   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]   
10  [5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
11  [5, 6, 7, 8, 9, 10, 11, 12, 13, 14]

I tried to use this nested condition with apply

df['c'] = df.apply(lambda x: x.a if x.b is float else (x.b if x.a is float else (x['a'] + x['b'])), axis = 1)

but is giving me this error :

TypeError: ('can only concatenate list (not "float") to list', u'occurred at index 0').

I am using ( and it's acutally working)

if x is float

because is the only way I found to separate a list from a NaN value.

piRSquared · Accepted Answer

When you use pd.DataFrame.stack null values are dropped by default. We can then group by the first level of the index and concatenate the lists together with sum

df.stack().groupby(level=0).sum()

0                                        [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
1                                        [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
2                                        [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
3                                        [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
4                                        [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
5                                        [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
6                                        [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
7                                        [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
8     [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
9     [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
10                                  [5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
11                                  [5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
dtype: object

We can then add it to a copy of the dataframe with assign

df.assign(c=df.stack().groupby(level=0).sum())

Or add it to a new column in place

df['c'] = df.stack().groupby(level=0).sum()

Concatenating columns of lists containing NaNs in a dataframe

Tags:

python

pandas

dataframe

csbr

1 Answers

piRSquared

Recent Activity

Donate For Us

Concatenating columns of lists containing NaNs in a dataframe

Tags:

python

pandas

dataframe

csbr

1 Answers

piRSquared

Related questions

Recent Activity

Donate For Us