I want to add multiple columns to a DataFrame:
import pandas as pd
df = pd.DataFrame(
[
(0, 1),
(1, 1),
(1, 2),
],
columns=['a', 'b']
)
def apply_fn(row) -> (int, float):
return int(row.a + row.b), float(row.a / row.b)
df[['c', 'd']] = df.apply(apply_fn, result_type='expand', axis=1)
Result:
>>> df
a b c d
0 0 1 1.0 0.0
1 1 1 2.0 1.0
2 1 2 3.0 0.5
>>> df.dtypes
a int64
b int64
c float64
d float64
dtype: object
Why is column c
not of dtype int
? Can I specify this somehow? Something like .apply(..., dtypes=[int, float])
?
I believe this is happening because result_type='expand'
causes to be expanded as a Series, so the first row is in its own series, then the next row, etc. But, because Series objects can only have one dtype, the ints get converted to floats.
For example, look at this:
>>> pd.Series([1, 0.0])
0 1.0
1 0.0
dtype: float64
One workaround would be to call tolist
on the apply
call, and wrap it in a call to DataFrame
:
>>> df[['c', 'd']] = pd.DataFrame(df.apply(apply_fn, axis=1).tolist())
a b c d
0 0 1 1 0.0
1 1 1 2 1.0
2 1 2 3 0.5
You can chain with astype
df.apply(apply_fn, axis=1, result_type='expand').astype({0:'int', 1:'float'})
Out[147]:
0 1
0 1 0.0
1 2 1.0
2 3 0.5
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With