I need to use the dtype of a pandas column in a function, but for some reason when I call the function using apply, the dtype is changed to object. Does anyone know what is happening here?
import pandas as pd
df = pd.DataFrame({'stringcol':['a'], 'floatcol': [1.5]})
df.dtypes
Out[1]:
floatcol float64
stringcol object
dtype: object
df.apply(lambda col: col.dtype)
Out[2]:
floatcol object
stringcol object
dtype: object
Note that this problem doesn't happen if the column is passed directly:
f = lambda col: col.dtype
f(test.floatcol)
Out[3]: dtype('float64')
Pandas DataFrame apply() Method The apply() method allows you to apply a function along one of the axis of the DataFrame, default 0, which is the index (row) axis.
As you correctly indicate, apply is not intended to be used to modify a dataframe. However, since apply takes an arbitrary function, it doesn't guarantee that applying the function will be idempotent and will not change the dataframe.
astype() This method is used to assign a specific data type to a DataFrame column.
To convert the column type to float in Pandas DataFrame: use the Series' astype() method.
It appears to be due to an optimization in DataFrame._apply_standard. The "fast path" in the code of that method creates an output Series whose dtype is the dtype of df.values, which in your case is object since the DataFrame is of mixed type. If you pass reduce=False to your apply call, the result is correct:
>>> df.apply(lambda col: col.dtype, reduce=False)
floatcol float64
stringcol object
dtype: object
(I must say that it is not clear to me how this behavior of reduce jibes with the documentation.)
For pandas versions v0.23+ the answer is:
>>> df.apply(lambda x: x.dtype, result_type='expand')
This works even though the Pandas documentation claims that the result_type argument "only act when axis=1 (columns)"
credit @jezrael
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With