python pandas data frame: assign function return tuple to two columns of a data frame

Question

I want to add two columns to a pandas Dataframe using a function that gives back a tuple as such:

data=pd.DataFrame({'a':[1,2,3,4,5,6],'b':['ssdfsdf','bbbbbb','cccccccccccc','ddd','eeeeee','ffffff']})

def givetup(string):
    
    result1 = string[0:3]
    # please imagine here a bunch of string functions concatenated.
    # including nlp methods with SpaCy 
    result2 = result1.upper()
    # the same here, imagine a bunch of steps to calculate result2 based on result 1
    
    return (result1,result2)

data['c'] = data['b'].apply(lambda x: givetup(x)[0])
data['d'] = data['b'].apply(lambda x: givetup(x)[1])

This is very inefficient (I am dealing with millions of rows) since I call two times the same function and make two calculations. Since result2 depends on result 1 I better not separate givetup into two functions How can I assign in one go result1 and result2 into new columns c and d with only one call to the function? what is the most efficient way to do it?

Please bear in mind that result1 and result2 are heavily time consuming string calculations.

EDIT 1: I knew about this: Apply pandas function to column to create multiple new columns?

i.e. applying vectorized functions. In my particular case it is highly undesirable or perhaps even impossible. Imagine that result 1 and result 2 are calculated based on language models and I need the plain text.

piRSquared · Accepted Answer

`zip`/`map`

data['c'], data['d'] = zip(*map(givetup, data['b']))

data

   a             b    c    d
0  1       ssdfsdf  ssd  SSD
1  2        bbbbbb  bbb  BBB
2  3  cccccccccccc  ccc  CCC
3  4           ddd  ddd  DDD
4  5        eeeeee  eee  EEE
5  6        ffffff  fff  FFF

`Series.str` and `assign`

This is specific to the examples given in givetup. But if it is possible to disentangle, then it is likely worth it.

The assign method arguments can take calables that reference columns created in an argument jus prior (NEAT).

data.assign(c=lambda d: d.b.str[0:3], d=lambda d: d.c.str.upper())

   a             b    c    d
0  1       ssdfsdf  ssd  SSD
1  2        bbbbbb  bbb  BBB
2  3  cccccccccccc  ccc  CCC
3  4           ddd  ddd  DDD
4  5        eeeeee  eee  EEE
5  6        ffffff  fff  FFF

Timings

data = pd.concat([data] * 10_000, ignore_index=True)

%timeit data['c'], data['d'] = zip(*map(givetup, data['b']))
%timeit data[['c','d']] = [givetup(a) for a in data['b']]
%timeit data.assign(c=lambda d: d.b.str[0:3], d=lambda d: d.c.str.upper())

69.7 ms ± 865 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
137 ms ± 937 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
34.6 ms ± 235 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Quang Hoang · Answer

You can try list comprehension here:

data[['c','d']] = [givetup(a) for a in data['b']]

Output:

   a             b    c    d
0  1       ssdfsdf  ssd  SSD
1  2        bbbbbb  bbb  BBB
2  3  cccccccccccc  ccc  CCC
3  4           ddd  ddd  DDD
4  5        eeeeee  eee  EEE
5  6        ffffff  fff  FFF

python pandas data frame: assign function return tuple to two columns of a data frame

Tags:

performance

python

pandas

apply

assign

JFerro

2 Answers

`zip`/`map`

`Series.str` and `assign`

Timings

piRSquared

Quang Hoang

Recent Activity

Donate For Us

python pandas data frame: assign function return tuple to two columns of a data frame

Tags:

performance

python

pandas

apply

assign

JFerro

2 Answers

zip/map

Series.str and assign

Timings

piRSquared

Quang Hoang

Related questions

Recent Activity

Donate For Us

`zip`/`map`

`Series.str` and `assign`