I need to use the dtype
of a pandas column in a function, but for some reason when I call the function using apply
, the dtype
is changed to object
. Does anyone know what is happening here?
import pandas as pd
df = pd.DataFrame({'stringcol':['a'], 'floatcol': [1.5]})
df.dtypes
Out[1]:
floatcol float64
stringcol object
dtype: object
df.apply(lambda col: col.dtype)
Out[2]:
floatcol object
stringcol object
dtype: object
Note that this problem doesn't happen if the column is passed directly:
f = lambda col: col.dtype
f(test.floatcol)
Out[3]: dtype('float64')
Pandas DataFrame apply() Method The apply() method allows you to apply a function along one of the axis of the DataFrame, default 0, which is the index (row) axis.
As you correctly indicate, apply is not intended to be used to modify a dataframe. However, since apply takes an arbitrary function, it doesn't guarantee that applying the function will be idempotent and will not change the dataframe.
astype() This method is used to assign a specific data type to a DataFrame column.
To convert the column type to float in Pandas DataFrame: use the Series' astype() method.
It appears to be due to an optimization in DataFrame._apply_standard
. The "fast path" in the code of that method creates an output Series whose dtype is the dtype of df.values
, which in your case is object
since the DataFrame is of mixed type. If you pass reduce=False
to your apply
call, the result is correct:
>>> df.apply(lambda col: col.dtype, reduce=False)
floatcol float64
stringcol object
dtype: object
(I must say that it is not clear to me how this behavior of reduce
jibes with the documentation.)
For pandas versions v0.23+
the answer is:
>>> df.apply(lambda x: x.dtype, result_type='expand')
This works even though the Pandas documentation claims that the result_type
argument "only act when axis=1
(columns)"
credit @jezrael
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With