Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Concatenating columns of lists containing NaNs in a dataframe

I have a pandas df with two columns having either lists or NaN values. There are no rows having NaN in both columns. I want to create a third column that merges the values of the other two columns in the following way:-

if row df.a is NaN -> df.c = df.b

if row df.b is Nan -> df.c = df.a

else df.c = df.a + df.b

Input:-

df
                                 a                                    b
0   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]                                  NaN   
1   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]                                  NaN   
2   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]                                  NaN   
3   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]                                  NaN   
4   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]                                  NaN   
5   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]                                  NaN   
6   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]                                  NaN   
7   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]                                  NaN   
8   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]  [5, 6, 7, 8, 9, 10, 11, 12, 13, 14]   
9   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]  [5, 6, 7, 8, 9, 10, 11, 12, 13, 14]   
10                             NaN  [5, 6, 7, 8, 9, 10, 11, 12, 13, 14]   
11                             NaN  [5, 6, 7, 8, 9, 10, 11, 12, 13, 14] 

output:

df.c

0   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
1   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
2   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
3   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
4   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
5   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
6   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
7   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]                                     
8   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]   
9   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]   
10  [5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
11  [5, 6, 7, 8, 9, 10, 11, 12, 13, 14]

I tried to use this nested condition with apply

df['c'] = df.apply(lambda x: x.a if x.b is float else (x.b if x.a is float else (x['a'] + x['b'])), axis = 1)

but is giving me this error :

TypeError: ('can only concatenate list (not "float") to list', u'occurred at index 0').

I am using ( and it's acutally working)

if x is float 

because is the only way I found to separate a list from a NaN value.

like image 293
csbr Avatar asked Jan 04 '23 17:01

csbr


1 Answers

When you use pd.DataFrame.stack null values are dropped by default. We can then group by the first level of the index and concatenate the lists together with sum

df.stack().groupby(level=0).sum()

0                                        [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
1                                        [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
2                                        [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
3                                        [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
4                                        [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
5                                        [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
6                                        [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
7                                        [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
8     [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
9     [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
10                                  [5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
11                                  [5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
dtype: object

We can then add it to a copy of the dataframe with assign

df.assign(c=df.stack().groupby(level=0).sum())

Or add it to a new column in place

df['c'] = df.stack().groupby(level=0).sum()
like image 96
piRSquared Avatar answered Jan 23 '23 13:01

piRSquared