Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas Assign Lambda Function

I have a DataFrame which has an open time and a close time and I am trying to calculate the difference in milliseconds.

My code is currently like this

df = df.assign(Latency=lambda d: d.CloseTimeStamp - d.CreationTimeStamp)
df.Latency = df.apply(lambda d: d.Latency.total_seconds() * 1000., axis=1)

However, I'd like to know why I can't do as a one-liner like so

df = df.assign(Latency=lambda d: (d.CloseTimeStamp - d.CreationTimeStamp).total_seconds() * 1000.)

When I try the latter I get AttributeError: 'Series' object has no attribute 'total_seconds'

like image 870
aydow Avatar asked Jul 05 '17 06:07

aydow


2 Answers

Total seconds is inside the .dt attribute, so this should work:

df = df.assign(Latency=lambda d: (d.CloseTimeStamp - d.CreationTimeStamp).dt.total_seconds() * 1000.)

Having said so, there's no need for a lambda function:

df = df.assign(Latency=(df.CloseTimeStamp - df.CreationTimeStamp).dt.total_seconds() * 1000.)

is much faster.

A further remark on efficiency: df.assign() builds a completely new dataframe object; if you're intending to assign this object back onto df, you're better off modifying df in-place:

df['Latency'] = (df.CloseTimeStamp - df.CreationTimeStamp).dt.total_seconds() * 1000.
like image 124
Ken Wei Avatar answered Oct 19 '22 15:10

Ken Wei


Need .dt accessor, because working with datetime Series, .dt is omit if DatetimeIndex:

df = df.assign(Latency=lambda d: (d.CloseTimeStamp -  d.CreationTimeStamp).dt.total_seconds() * 1000.)

Solution without lambda:

df = df.assign(Latency=(df.CloseTimeStamp - df.CreationTimeStamp).dt.total_seconds() * 1000.)

...and solution without assign:

df['Latency'] = (df.CloseTimeStamp - df.CreationTimeStamp).dt.total_seconds() * 1000.
like image 24
jezrael Avatar answered Oct 19 '22 15:10

jezrael