Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Slow performance of timedelta methods

Why does .dt.days take 100 times longer than .dt.total_seconds()?

df = pd.DataFrame({'a': pd.date_range('2011-01-01 00:00:00', periods=1000000, freq='1H')})
df.a = df.a - pd.to_datetime('2011-01-01 00:00:00')
df.a.dt.days # 12 sec
df.a.dt.total_seconds() # 0.14 sec
like image 285
max Avatar asked Oct 19 '22 03:10

max


1 Answers

.dt.total_seconds is basically just a multiplication, and can be performed at numpythonic speed:

def total_seconds(self):
    """
    Total duration of each element expressed in seconds.

    .. versionadded:: 0.17.0
    """
    return self._maybe_mask_results(1e-9 * self.asi8)

Whereas if we abort the days operation, we see it's spending its time in a slow listcomp with a getattr and a construction of Timedelta objects (source):

    360         else:
    361             result = np.array([getattr(Timedelta(val), m)
--> 362                                for val in values], dtype='int64')
    363         return result
    364 

To me this screams "look, let's get it correct, and we'll cross the optimization bridge when we come to it."

like image 161
DSM Avatar answered Jan 04 '23 07:01

DSM