Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Apply pandas function to column to create multiple new columns?

I usually do this using zip:

>>> df = pd.DataFrame([[i] for i in range(10)], columns=['num'])
>>> df
    num
0    0
1    1
2    2
3    3
4    4
5    5
6    6
7    7
8    8
9    9

>>> def powers(x):
>>>     return x, x**2, x**3, x**4, x**5, x**6

>>> df['p1'], df['p2'], df['p3'], df['p4'], df['p5'], df['p6'] = \
>>>     zip(*df['num'].map(powers))

>>> df
        num     p1      p2      p3      p4      p5      p6
0       0       0       0       0       0       0       0
1       1       1       1       1       1       1       1
2       2       2       4       8       16      32      64
3       3       3       9       27      81      243     729
4       4       4       16      64      256     1024    4096
5       5       5       25      125     625     3125    15625
6       6       6       36      216     1296    7776    46656
7       7       7       49      343     2401    16807   117649
8       8       8       64      512     4096    32768   262144
9       9       9       81      729     6561    59049   531441

Building off of user1827356 's answer, you can do the assignment in one pass using df.merge:

df.merge(df.textcol.apply(lambda s: pd.Series({'feature1':s+1, 'feature2':s-1})), 
    left_index=True, right_index=True)

    textcol  feature1  feature2
0  0.772692  1.772692 -0.227308
1  0.857210  1.857210 -0.142790
2  0.065639  1.065639 -0.934361
3  0.819160  1.819160 -0.180840
4  0.088212  1.088212 -0.911788

EDIT: Please be aware of the huge memory consumption and low speed: https://ys-l.github.io/posts/2015/08/28/how-not-to-use-pandas-apply/ !


In 2020, I use apply() with argument result_type='expand'

>>> appiled_df = df.apply(lambda row: fn(row.text), axis='columns', result_type='expand')
>>> df = pd.concat([df, appiled_df], axis='columns')

This is what I've done in the past

df = pd.DataFrame({'textcol' : np.random.rand(5)})

df
    textcol
0  0.626524
1  0.119967
2  0.803650
3  0.100880
4  0.017859

df.textcol.apply(lambda s: pd.Series({'feature1':s+1, 'feature2':s-1}))
   feature1  feature2
0  1.626524 -0.373476
1  1.119967 -0.880033
2  1.803650 -0.196350
3  1.100880 -0.899120
4  1.017859 -0.982141

Editing for completeness

pd.concat([df, df.textcol.apply(lambda s: pd.Series({'feature1':s+1, 'feature2':s-1}))], axis=1)
    textcol feature1  feature2
0  0.626524 1.626524 -0.373476
1  0.119967 1.119967 -0.880033
2  0.803650 1.803650 -0.196350
3  0.100880 1.100880 -0.899120
4  0.017859 1.017859 -0.982141

This is the correct and easiest way to accomplish this for 95% of use cases:

>>> df = pd.DataFrame(zip(*[range(10)]), columns=['num'])
>>> df
    num
0    0
1    1
2    2
3    3
4    4
5    5

>>> def example(x):
...     x['p1'] = x['num']**2
...     x['p2'] = x['num']**3
...     x['p3'] = x['num']**4
...     return x

>>> df = df.apply(example, axis=1)
>>> df
    num  p1  p2  p3
0    0   0   0    0
1    1   1   1    1
2    2   4   8   16
3    3   9  27   81
4    4  16  64  256

Just use result_type="expand"

df = pd.DataFrame(np.random.randint(0,10,(10,2)), columns=["random", "a"])
df[["sq_a","cube_a"]] = df.apply(lambda x: [x.a**2, x.a**3], axis=1, result_type="expand")