Why does .dt.days
take 100 times longer than .dt.total_seconds()
?
df = pd.DataFrame({'a': pd.date_range('2011-01-01 00:00:00', periods=1000000, freq='1H')})
df.a = df.a - pd.to_datetime('2011-01-01 00:00:00')
df.a.dt.days # 12 sec
df.a.dt.total_seconds() # 0.14 sec
.dt.total_seconds
is basically just a multiplication, and can be performed at numpythonic speed:
def total_seconds(self):
"""
Total duration of each element expressed in seconds.
.. versionadded:: 0.17.0
"""
return self._maybe_mask_results(1e-9 * self.asi8)
Whereas if we abort the days
operation, we see it's spending its time in a slow listcomp with a getattr and a construction of Timedelta objects (source):
360 else:
361 result = np.array([getattr(Timedelta(val), m)
--> 362 for val in values], dtype='int64')
363 return result
364
To me this screams "look, let's get it correct, and we'll cross the optimization bridge when we come to it."
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With