I have a DataFrame which has an open time and a close time and I am trying to calculate the difference in milliseconds.
My code is currently like this
df = df.assign(Latency=lambda d: d.CloseTimeStamp - d.CreationTimeStamp)
df.Latency = df.apply(lambda d: d.Latency.total_seconds() * 1000., axis=1)
However, I'd like to know why I can't do as a one-liner like so
df = df.assign(Latency=lambda d: (d.CloseTimeStamp - d.CreationTimeStamp).total_seconds() * 1000.)
When I try the latter I get AttributeError: 'Series' object has no attribute 'total_seconds'
Total seconds is inside the .dt
attribute, so this should work:
df = df.assign(Latency=lambda d: (d.CloseTimeStamp - d.CreationTimeStamp).dt.total_seconds() * 1000.)
Having said so, there's no need for a lambda function:
df = df.assign(Latency=(df.CloseTimeStamp - df.CreationTimeStamp).dt.total_seconds() * 1000.)
is much faster.
A further remark on efficiency: df.assign()
builds a completely new dataframe object; if you're intending to assign this object back onto df
, you're better off modifying df
in-place:
df['Latency'] = (df.CloseTimeStamp - df.CreationTimeStamp).dt.total_seconds() * 1000.
Need .dt
accessor, because working with datetime Series
, .dt
is omit if DatetimeIndex
:
df = df.assign(Latency=lambda d: (d.CloseTimeStamp - d.CreationTimeStamp).dt.total_seconds() * 1000.)
Solution without lambda:
df = df.assign(Latency=(df.CloseTimeStamp - df.CreationTimeStamp).dt.total_seconds() * 1000.)
...and solution without assign
:
df['Latency'] = (df.CloseTimeStamp - df.CreationTimeStamp).dt.total_seconds() * 1000.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With