I have a pandas df with two columns having either lists or NaN values. There are no rows having NaN in both columns. I want to create a third column that merges the values of the other two columns in the following way:-
if row df.a is NaN -> df.c = df.b
if row df.b is Nan -> df.c = df.a
else df.c = df.a + df.b
Input:-
df
a b
0 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] NaN
1 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] NaN
2 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] NaN
3 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] NaN
4 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] NaN
5 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] NaN
6 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] NaN
7 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] NaN
8 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] [5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
9 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] [5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
10 NaN [5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
11 NaN [5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
output:
df.c
0 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
1 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
2 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
3 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
4 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
5 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
6 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
7 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
8 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
9 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
10 [5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
11 [5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
I tried to use this nested condition with apply
df['c'] = df.apply(lambda x: x.a if x.b is float else (x.b if x.a is float else (x['a'] + x['b'])), axis = 1)
but is giving me this error :
TypeError: ('can only concatenate list (not "float") to list', u'occurred at index 0').
I am using ( and it's acutally working)
if x is float
because is the only way I found to separate a list from a NaN value.
When you use pd.DataFrame.stack
null values are dropped by default. We can then group by the first level of the index and concatenate the lists together with sum
df.stack().groupby(level=0).sum()
0 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
1 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
2 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
3 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
4 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
5 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
6 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
7 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
8 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
9 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
10 [5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
11 [5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
dtype: object
We can then add it to a copy of the dataframe with assign
df.assign(c=df.stack().groupby(level=0).sum())
Or add it to a new column in place
df['c'] = df.stack().groupby(level=0).sum()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With