Is there a faster way to cast all columns of a pandas dataframe to a single type? This seems particularly slow:
df = df.apply(lambda x: x.astype(np.float64), axis=1)
I suspect there's not much I can do about it because of the memory allocation overhead of numpy.ndarray.astype
.
I've also tried pd.to_numeric
but it arbitrarily chooses to cast a few of my columns into int
types instead.
No need for apply
, just use DataFrame.astype
directly.
df.astype(np.float64)
apply
-ing is also going to give you a pretty bad performance hit.
Example
df = pd.DataFrame(np.arange(10**7).reshape(10**4, 10**3))
%timeit df.astype(np.float64)
1 loop, best of 3: 288 ms per loop
%timeit df.apply(lambda x: x.astype(np.float64), axis=0)
1 loop, best of 3: 748 ms per loop
%timeit df.apply(lambda x: x.astype(np.float64), axis=1)
1 loop, best of 3: 2.95 s per loop
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With