Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to return multiple columns using apply in Pandas dataframe

Tags:

pandas

apply

I am trying to apply a function to a column of a Pandas dataframe, the function returns a list of tuples. This is my function:

def myfunc(text):
  values=[]
  sections=api_call(text)
  for (part1, part2, part3) in sections:
    value=(part1, part2, part3) 
    values.append(value)
  return values

For example,

sections=myfunc("History: Had a fever\n Allergies: No")
print(sections)

output:

[('past_medical_history', 'History:', 'History: Had a fever\n '), ('allergies', 'Allergies:', 'Allergies: No')]

For each tuple, I would like to create a new column. For example:

the original dataframe looks like this:

id text
0  History: Had a fever\n Allergies: No
1  text2

and after applying the function, I want the dataframe to look like this (where xxx is various text content):

id text            part1        part2        part3
0  History: Had... past_...     History:     History: ...
0  Allergies: No   allergies    Allergies:   Allergies: No
1  text2           xxx          xxx          xxx
1  text2           xxx          xxx          xxx
1  text2           xxx          xxx          xxx
...

I could loop through the dataframe and generate a new dataframe but it would be really slow. I tried following code but received a ValueError. Any suggestions?

df.apply(lambda x: pd.Series(myfunc(x['col']), index=['part1', 'part2', 'part3']), axis=1)

I did a little bit more research, so my question actually boils down to how to unnest a column with a list of tuples. I found the answer from this link Split a list of tuples in a column of dataframe to columns of a dataframe helps. And here is what I did

# step1: sectionizing
df["sections"] =df["text"].apply(myfunc)

# step2: unnest the sections 
part1s = []
part2s = []
part3s = []
ids = []

def create_lists(row):
    tuples = row['sections']
    id = row['id']
    for t in tuples:
        part1s.append(t[0])
        part2s.append(t[1])
        part3s.append(t[2])
        ids.append(id)

df.apply(create_lists, axis=1)

new_df = pd.DataFrame({"part1" :part1s, "part2": part2s, "part3": part3s, 
                       "id": ids})[["part1", "part2", 'part3', "id"]]

But the performance is not so good. I wonder if there is better way.

like image 792
GLP Avatar asked Oct 27 '25 15:10

GLP


1 Answers

The idea here is to set up some data and a function that can be operated on this data to generate three items that we can return. Choosing split and comma-separated values seems to be quick and mirror the function you are after.

import pandas as pd
data = { 'names' : ['x,a,c','y,er,rt','z,1,ere']}
df = pd.DataFrame(data)

gives

     names
0    x,a,c
1  y,er,rt
2  z,1,ere

now

def myfunc(text):
  sections=text.split(',')
  return sections

df[['part1', 'part2', 'part3']] = df['names'].apply(myfunc)

will give

    names   part1   part2   part3
0   x,a,c   x       y       z
1   y,er,rt a       er      1
2   z,1,ere c       rt      ere

Which is probably not what you want, however

df['part1'] ,df['part2'], df['part3'] = zip(*df['names'].apply(myfunc))

gives

     names     part1 part2 part3
0    x,a,c     x     a     c
1  y,er,rt     y     er    rt
2  z,1,ere     z     1     ere

which is probably what you want.

like image 61
Paul Brennan Avatar answered Oct 30 '25 12:10

Paul Brennan



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!