I am applying a function on the rows of a dataframe in pandas. That function returns four values (meaning, four values per row). In practice, this means that the returned object from the apply function is a Series containing tuples. I want to add these to their own columns. I know that I can convert that output to a DataFrame and then concatenate with the old DataFrame, like so:
import pandas as pd
def some_func(i):
return i+1, i+2, i+3, i+4
df = pd.DataFrame(range(10), columns=['start'])
res = df.apply(lambda row: some_func(row['start']), axis=1)
# convert to df and add column names
res_df = res.apply(pd.Series)
res_df.columns = ['label_1', 'label_2', 'label_3', 'label_4']
# concatenate with old df
df = pd.concat([df, res_df], axis=1)
print(df)
My question is whether there is a better way to do this? Especially the res.apply(pd.Series) seems redundant, but I don't know a better alternative. Performance is an important factor for me.
As requested, an example input DataFrame could look like this
start
0 0
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
And the expected output, with the four added columns:
start label_1 label_2 label_3 label_4
0 0 1 2 3 4
1 1 2 3 4 5
2 2 3 4 5 6
3 3 4 5 6 7
4 4 5 6 7 8
5 5 6 7 8 9
6 6 7 8 9 10
7 7 8 9 10 11
8 8 9 10 11 12
9 9 10 11 12 13
Directly assigning values to the DataFrame would be faster than the concating.
This is one of the way to assign
df = pd.DataFrame(range(10), columns=['start'])
df['label_1'], df['label_2'], df['label_3'], df['label_4'] = zip(*[some_func(x) for x in df['start']])
This is faster than res.apply(pd.Series).
Refer adding multiple columns to pandas simultaneously for more ways to assign multiple columns.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With