Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas to_numeric is not downcasting integer column

I have a dataframe with a column with dtype('int64'). The values in the column range from 0-10. The dataframe has 770K rows and 56 columns of different types. When I run the code below, I get dtype('int64'). I would have thought that the result would have been at a minimum to downcast to int32 or int16. Here's a replicable example.

import pandas as pd

df = pd.DataFrame([x for x in range(10)]*77000, columns=['recommendation'])
df.dtypes
df.recommendation.apply(lambda x: pd.to_numeric(x, downcast='integer')).dtypes
like image 634
user8992774 Avatar asked Oct 26 '18 13:10

user8992774


1 Answers

The apply method works cell-by-cell, so it cannot figure out that the whole column can be downcast. You need to call to_numeric on the whole column, as indicated by Ben in comment:

pd.to_numeric(df.recommendation,downcast='integer')
like image 174
sds Avatar answered Sep 29 '22 18:09

sds