Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python/pandas - Using DataFrame.apply with function returning dictionary

Tags:

python

pandas

I am aware of how the apply function can be used on a dataframe to calculate new columns and append them to the dataframe. My question is if I have a function which takes as parameters several values (corresponding to the columns currently in the dataframe) and returns a dictionary (corresponding to the columns I want to add to the dataframe), is there a simple/elegant way to apply this function to the dataframe and generate the new columns?

For example, currently I am doing this:

import pandas as pd
import numpy as np

col1 = [np.random.randn()] * 10
col2 = [np.random.randn()] * 10
col3 = [np.random.randn()] * 10

df = pd.DataFrame({'col1': col1,
                   'col2': col2,
                   'col3': col3 })

df['col4'] = df.apply(lambda x: get_col4(x['col1'], x['col2']), axis=1)
df['col5'] = df.apply(lambda x: get_col5(x['col1'], x['col2'], x['col3']), 
axis=1)
df['col6'] = df.apply(lambda x: get_col6(x['col3'], x['col4'], x['col5']), 
axis=1)
df['col7'] = df.apply(lambda x: get_col7(x['col4'], x['col6']), axis=1)

where I have individual functions for each calculated column, each of which depend on some combination of the previous columns.

However, because the values of the calculated columns are dependent on each other, I think it would be much more efficient and elegant to use a function like the one below to calculate the new columns all at once:

def get_cols(col1, col2, col3):
    #some calculations...
    return {'col4': col4,
            'col5': col5,
            'col6': col6,
            'col7': col7}

Is there a way to do this using pandas?

like image 700
Rory Devitt Avatar asked Oct 18 '17 12:10

Rory Devitt


2 Answers

Since you want to retain the previous columns, you can make a Series out of the new columns, and then append that new Series object to the original Series. Keep in mind that the input to get_cols is an individual row (and is thus a Series) from the original DataFrame.

import pandas as pd
import numpy as np

def get_cols(cols):
    col4 = cols[0] * 2
    col5 = cols[1] * 2
    col6 = cols[2] * 2
    return cols.append(pd.Series([col4, col5, col6], index=['col4', 'col5', 'col6']))

col1 = [np.random.randn()] * 10
col2 = [np.random.randn()] * 10
col3 = [np.random.randn()] * 10

df = pd.DataFrame({'col1': col1,
                   'col2': col2,
                   'col3': col3 })

df = df.apply(get_cols, axis=1)
print(df)

       col1      col2      col3      col4      col5      col6
0 -0.809803  0.522547  0.064061 -1.619606  1.045093  0.128122
1 -0.809803  0.522547  0.064061 -1.619606  1.045093  0.128122
2 -0.809803  0.522547  0.064061 -1.619606  1.045093  0.128122
3 -0.809803  0.522547  0.064061 -1.619606  1.045093  0.128122
4 -0.809803  0.522547  0.064061 -1.619606  1.045093  0.128122
5 -0.809803  0.522547  0.064061 -1.619606  1.045093  0.128122
6 -0.809803  0.522547  0.064061 -1.619606  1.045093  0.128122
7 -0.809803  0.522547  0.064061 -1.619606  1.045093  0.128122
8 -0.809803  0.522547  0.064061 -1.619606  1.045093  0.128122
9 -0.809803  0.522547  0.064061 -1.619606  1.045093  0.128122
like image 95
azizj Avatar answered Nov 14 '22 22:11

azizj


This might help you: pandas apply function that returns multiple values to rows in pandas dataframe

The right method is to return a list instead of a dictionary with your second function "get_cols" and then use apply.

like image 27
Rockbar Avatar answered Nov 14 '22 22:11

Rockbar