Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way to cast all dataframe columns to float - pandas astype slow

Is there a faster way to cast all columns of a pandas dataframe to a single type? This seems particularly slow:

df = df.apply(lambda x: x.astype(np.float64), axis=1)

I suspect there's not much I can do about it because of the memory allocation overhead of numpy.ndarray.astype.

I've also tried pd.to_numeric but it arbitrarily chooses to cast a few of my columns into int types instead.

like image 683
elleciel Avatar asked Mar 06 '17 14:03

elleciel


1 Answers

No need for apply, just use DataFrame.astype directly.

df.astype(np.float64)

apply-ing is also going to give you a pretty bad performance hit.

Example

df = pd.DataFrame(np.arange(10**7).reshape(10**4, 10**3))

%timeit df.astype(np.float64)
1 loop, best of 3: 288 ms per loop

%timeit df.apply(lambda x: x.astype(np.float64), axis=0)
1 loop, best of 3: 748 ms per loop

%timeit df.apply(lambda x: x.astype(np.float64), axis=1)
1 loop, best of 3: 2.95 s per loop
like image 97
miradulo Avatar answered Oct 16 '22 23:10

miradulo