Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Data type of pandas column changes to object when it's passed to a function via apply?

Tags:

python

pandas

I need to use the dtype of a pandas column in a function, but for some reason when I call the function using apply, the dtype is changed to object. Does anyone know what is happening here?

import pandas as pd

df = pd.DataFrame({'stringcol':['a'], 'floatcol': [1.5]})
df.dtypes
Out[1]: 
floatcol     float64
stringcol     object
dtype: object

df.apply(lambda col: col.dtype)
Out[2]: 
floatcol     object
stringcol    object
dtype: object

Note that this problem doesn't happen if the column is passed directly:

f = lambda col: col.dtype
f(test.floatcol)
Out[3]: dtype('float64')
like image 658
maxymoo Avatar asked Jul 30 '15 04:07

maxymoo


People also ask

What does the apply function do to a pandas DataFrame?

Pandas DataFrame apply() Method The apply() method allows you to apply a function along one of the axis of the DataFrame, default 0, which is the index (row) axis.

Does apply change DataFrame?

As you correctly indicate, apply is not intended to be used to modify a dataframe. However, since apply takes an arbitrary function, it doesn't guarantee that applying the function will be idempotent and will not change the dataframe.

Which method is used to change the data type of a column in pandas DataFrame?

astype() This method is used to assign a specific data type to a DataFrame column.

Which pandas method will convert a column type from object to float even if there are invalid numbers in that column?

To convert the column type to float in Pandas DataFrame: use the Series' astype() method.


2 Answers

It appears to be due to an optimization in DataFrame._apply_standard. The "fast path" in the code of that method creates an output Series whose dtype is the dtype of df.values, which in your case is object since the DataFrame is of mixed type. If you pass reduce=False to your apply call, the result is correct:

>>> df.apply(lambda col: col.dtype, reduce=False)
floatcol     float64
stringcol     object
dtype: object

(I must say that it is not clear to me how this behavior of reduce jibes with the documentation.)

like image 51
BrenBarn Avatar answered Oct 21 '22 08:10

BrenBarn


For pandas versions v0.23+ the answer is:

>>> df.apply(lambda x: x.dtype, result_type='expand')

This works even though the Pandas documentation claims that the result_type argument "only act when axis=1 (columns)"

credit @jezrael

like image 37
johnDanger Avatar answered Oct 21 '22 07:10

johnDanger