Slow performance of timedelta methods

Question

Why does .dt.days take 100 times longer than .dt.total_seconds()?

df = pd.DataFrame({'a': pd.date_range('2011-01-01 00:00:00', periods=1000000, freq='1H')})
df.a = df.a - pd.to_datetime('2011-01-01 00:00:00')
df.a.dt.days # 12 sec
df.a.dt.total_seconds() # 0.14 sec

DSM · Accepted Answer

.dt.total_seconds is basically just a multiplication, and can be performed at numpythonic speed:

def total_seconds(self):
    """
    Total duration of each element expressed in seconds.

    .. versionadded:: 0.17.0
    """
    return self._maybe_mask_results(1e-9 * self.asi8)

Whereas if we abort the days operation, we see it's spending its time in a slow listcomp with a getattr and a construction of Timedelta objects (source):

    360         else:
    361             result = np.array([getattr(Timedelta(val), m)
--> 362                                for val in values], dtype='int64')
    363         return result
    364

To me this screams "look, let's get it correct, and we'll cross the optimization bridge when we come to it."

Slow performance of timedelta methods

Tags:

python-3.x

pandas

max

1 Answers

DSM

Recent Activity

Donate For Us

Slow performance of timedelta methods

Tags:

python-3.x

pandas

max

1 Answers

DSM

Related questions

Recent Activity

Donate For Us