I am aware of how the apply function can be used on a dataframe to calculate new columns and append them to the dataframe. My question is if I have a function which takes as parameters several values (corresponding to the columns currently in the dataframe) and returns a dictionary (corresponding to the columns I want to add to the dataframe), is there a simple/elegant way to apply this function to the dataframe and generate the new columns?
For example, currently I am doing this:
import pandas as pd
import numpy as np
col1 = [np.random.randn()] * 10
col2 = [np.random.randn()] * 10
col3 = [np.random.randn()] * 10
df = pd.DataFrame({'col1': col1,
'col2': col2,
'col3': col3 })
df['col4'] = df.apply(lambda x: get_col4(x['col1'], x['col2']), axis=1)
df['col5'] = df.apply(lambda x: get_col5(x['col1'], x['col2'], x['col3']),
axis=1)
df['col6'] = df.apply(lambda x: get_col6(x['col3'], x['col4'], x['col5']),
axis=1)
df['col7'] = df.apply(lambda x: get_col7(x['col4'], x['col6']), axis=1)
where I have individual functions for each calculated column, each of which depend on some combination of the previous columns.
However, because the values of the calculated columns are dependent on each other, I think it would be much more efficient and elegant to use a function like the one below to calculate the new columns all at once:
def get_cols(col1, col2, col3):
#some calculations...
return {'col4': col4,
'col5': col5,
'col6': col6,
'col7': col7}
Is there a way to do this using pandas?
Since you want to retain the previous columns, you can make a Series out of the new columns, and then append that new Series object to the original Series. Keep in mind that the input to get_cols
is an individual row (and is thus a Series) from the original DataFrame.
import pandas as pd
import numpy as np
def get_cols(cols):
col4 = cols[0] * 2
col5 = cols[1] * 2
col6 = cols[2] * 2
return cols.append(pd.Series([col4, col5, col6], index=['col4', 'col5', 'col6']))
col1 = [np.random.randn()] * 10
col2 = [np.random.randn()] * 10
col3 = [np.random.randn()] * 10
df = pd.DataFrame({'col1': col1,
'col2': col2,
'col3': col3 })
df = df.apply(get_cols, axis=1)
print(df)
col1 col2 col3 col4 col5 col6
0 -0.809803 0.522547 0.064061 -1.619606 1.045093 0.128122
1 -0.809803 0.522547 0.064061 -1.619606 1.045093 0.128122
2 -0.809803 0.522547 0.064061 -1.619606 1.045093 0.128122
3 -0.809803 0.522547 0.064061 -1.619606 1.045093 0.128122
4 -0.809803 0.522547 0.064061 -1.619606 1.045093 0.128122
5 -0.809803 0.522547 0.064061 -1.619606 1.045093 0.128122
6 -0.809803 0.522547 0.064061 -1.619606 1.045093 0.128122
7 -0.809803 0.522547 0.064061 -1.619606 1.045093 0.128122
8 -0.809803 0.522547 0.064061 -1.619606 1.045093 0.128122
9 -0.809803 0.522547 0.064061 -1.619606 1.045093 0.128122
This might help you: pandas apply function that returns multiple values to rows in pandas dataframe
The right method is to return a list instead of a dictionary with your second function "get_cols" and then use apply.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With