Given a dataframe df
, (real life is a +1000 row df). Elements of ColB
are lists of lists.
ColA ColB
0 'A' [['a','b','c'],['d','e','f']]
1 'B' [['f','g','h'],['i','j','k']]
2 'A' [['l','m','n'],['o','p','q']]
How can efficiently create a ColC
that is a string using the elements in the different columns, like this:
ColC
'A>+a b:c,+d e:f'
'B>+f g:h,+i j:k'
'A>+l m:n,+o p:q'
I tried with df.apply
along these lines, inspired by this:
df['ColC'] = df.apply(lambda x:'%s>' % (x['ColA']),axis=1)
This works for the first 2 elements of the string. Having a hard time with the rest.
Something like this?
df['ColC'] = df.ColA + '>+' + df.ColB.str[0].str[0] + \
' ' + df.ColB.str[0].str[1] + ':' + \
df.ColB.str[0].str[2] + ',+' + \
df.ColB.str[1].str[0] + ' ' + \
df.ColB.str[1].str[1] + ':' + \
df.ColB.str[1].str[2]
Output:
ColA ColB ColC
0 A [[a, b, c], [d, e, f]] A>+a b:c,+d e:f
1 B [[f, g, h], [i, j, k]] B>+f g:h,+i j:k
2 A [[l, m, n], [o, p, q]] A>+l m:n,+o p:q
df = pd.concat([df]*333)
Wen's Method
%%timeit df[['t1','t2']]=df['ColB'].apply(pd.Series).applymap(lambda x : ('{} {}:{}'.format(x[0],x[1],x[2]))) df.ColA+'>+'+df.t1+',+'+df.t2
1 loop, best of 3: 363 ms per loop
miradulo Method
%%timeit df.apply(lambda r:'{}>+{} {}:{},+{} {}:{}'.format(*flatten(r)), axis=1)
10 loops, best of 3: 74.9 ms per loop
ScottBoston Method
%%timeit df.ColA + '>+' + df.ColB.str[0].str[0] + \ ' ' + df.ColB.str[0].str[1] + ':' + \ df.ColB.str[0].str[2] + ',+' + \ df.ColB.str[1].str[0] + ' ' + \ df.ColB.str[1].str[1] + ':' + \ df.ColB.str[1].str[2]
100 loops, best of 3: 12.4 ms per loop
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With