I am applying a function on the rows of a dataframe in pandas. That function returns four values (meaning, four values per row). In practice, this means that the returned object from the apply function is a Series containing tuples. I want to add these to their own columns. I know that I can convert that output to a DataFrame and then concatenate with the old DataFrame, like so:
import pandas as pd
def some_func(i):
return i+1, i+2, i+3, i+4
df = pd.DataFrame(range(10), columns=['start'])
res = df.apply(lambda row: some_func(row['start']), axis=1)
# convert to df and add column names
res_df = res.apply(pd.Series)
res_df.columns = ['label_1', 'label_2', 'label_3', 'label_4']
# concatenate with old df
df = pd.concat([df, res_df], axis=1)
print(df)
My question is whether there is a better way to do this? Especially the res.apply(pd.Series)
seems redundant, but I don't know a better alternative. Performance is an important factor for me.
As requested, an example input DataFrame could look like this
start
0 0
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
And the expected output, with the four added columns:
start label_1 label_2 label_3 label_4
0 0 1 2 3 4
1 1 2 3 4 5
2 2 3 4 5 6
3 3 4 5 6 7
4 4 5 6 7 8
5 5 6 7 8 9
6 6 7 8 9 10
7 7 8 9 10 11
8 8 9 10 11 12
9 9 10 11 12 13
Directly assigning values to the DataFrame would be faster than the concating.
This is one of the way to assign
df = pd.DataFrame(range(10), columns=['start'])
df['label_1'], df['label_2'], df['label_3'], df['label_4'] = zip(*[some_func(x) for x in df['start']])
This is faster than res.apply(pd.Series)
.
Refer adding multiple columns to pandas simultaneously for more ways to assign multiple columns.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With