Pandas: apply result_type="expand": wrong dtypes

Question

I want to add multiple columns to a DataFrame:

import pandas as pd

df = pd.DataFrame(
    [
        (0, 1),
        (1, 1),
        (1, 2),
    ],
    columns=['a', 'b']
)


def apply_fn(row) -> (int, float):
    return int(row.a + row.b), float(row.a / row.b)


df[['c', 'd']] = df.apply(apply_fn, result_type='expand', axis=1)

Result:

>>> df
   a  b    c    d
0  0  1  1.0  0.0
1  1  1  2.0  1.0
2  1  2  3.0  0.5

>>> df.dtypes
a      int64
b      int64
c    float64
d    float64
dtype: object

Why is column c not of dtype int? Can I specify this somehow? Something like .apply(..., dtypes=[int, float])?

Admin · Accepted Answer

I believe this is happening because result_type='expand' causes to be expanded as a Series, so the first row is in its own series, then the next row, etc. But, because Series objects can only have one dtype, the ints get converted to floats.

For example, look at this:

>>> pd.Series([1, 0.0])
0    1.0
1    0.0
dtype: float64

One workaround would be to call tolist on the apply call, and wrap it in a call to DataFrame:

>>> df[['c', 'd']] = pd.DataFrame(df.apply(apply_fn, axis=1).tolist())
   a  b  c    d
0  0  1  1  0.0
1  1  1  2  1.0
2  1  2  3  0.5

BENY · Answer

You can chain with astype

df.apply(apply_fn, axis=1, result_type='expand').astype({0:'int', 1:'float'})
Out[147]: 
   0    1
0  1  0.0
1  2  1.0
2  3  0.5

Pandas: apply result_type="expand": wrong dtypes

Tags:

python

pandas

apply

dtype

MrTomRod

2 Answers

BENY

Recent Activity

Donate For Us

Pandas: apply result_type="expand": wrong dtypes

Tags:

python

pandas

apply

dtype

MrTomRod

2 Answers

BENY

Related questions

Recent Activity

Donate For Us