Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create strings from dataframe columns elements in Python?

Given a dataframe df, (real life is a +1000 row df). Elements of ColB are lists of lists.

  ColA    ColB
0  'A'    [['a','b','c'],['d','e','f']]
1  'B'    [['f','g','h'],['i','j','k']]
2  'A'    [['l','m','n'],['o','p','q']]

How can efficiently create a ColC that is a string using the elements in the different columns, like this:

      ColC
'A>+a b:c,+d e:f'
'B>+f g:h,+i j:k'
'A>+l m:n,+o p:q'

I tried with df.apply along these lines, inspired by this:

df['ColC'] = df.apply(lambda x:'%s>' % (x['ColA']),axis=1)

This works for the first 2 elements of the string. Having a hard time with the rest.

like image 874
hernanavella Avatar asked Nov 01 '17 18:11

hernanavella


1 Answers

Something like this?

df['ColC']  = df.ColA + '>+' + df.ColB.str[0].str[0] + \
              ' ' + df.ColB.str[0].str[1] + ':' + \
              df.ColB.str[0].str[2] + ',+' + \
              df.ColB.str[1].str[0] + ' ' + \
              df.ColB.str[1].str[1] + ':' + \
              df.ColB.str[1].str[2]

Output:

  ColA                    ColB             ColC
0    A  [[a, b, c], [d, e, f]]  A>+a b:c,+d e:f
1    B  [[f, g, h], [i, j, k]]  B>+f g:h,+i j:k
2    A  [[l, m, n], [o, p, q]]  A>+l m:n,+o p:q

Timings

df = pd.concat([df]*333)

Wen's Method

%%timeit df[['t1','t2']]=df['ColB'].apply(pd.Series).applymap(lambda x : ('{} {}:{}'.format(x[0],x[1],x[2]))) df.ColA+'>+'+df.t1+',+'+df.t2

1 loop, best of 3: 363 ms per loop

miradulo Method

%%timeit df.apply(lambda r:'{}>+{} {}:{},+{} {}:{}'.format(*flatten(r)), axis=1)

10 loops, best of 3: 74.9 ms per loop

ScottBoston Method

%%timeit df.ColA + '>+' + df.ColB.str[0].str[0] + \ ' ' + df.ColB.str[0].str[1] + ':' + \ df.ColB.str[0].str[2] + ',+' + \ df.ColB.str[1].str[0] + ' ' + \ df.ColB.str[1].str[1] + ':' + \ df.ColB.str[1].str[2]

100 loops, best of 3: 12.4 ms per loop

like image 160
Scott Boston Avatar answered Sep 20 '22 22:09

Scott Boston