How to apply function on groupby dataframe
Given the dataframe df.
userid trip_id lat long
141.0 1.0 39.979547 116.306813
141.0 1.0 39.979558 116.306823
141.0 1.0 39.979575 116.306835
141.0 1.0 39.979587 116.306847
141.0 2.0 39.979603 116.306852
141.0 2.0 39.979612 116.306867
141.0 2.0 39.979627 116.306877
141.0 2.0 39.979635 116.306888
141.0 3.0 39.979645 116.306903
141.0 3.0 39.979657 116.306913
141.0 3.0 39.979670 116.306920
141.0 3.0 39.979682 116.306920
I want to calculate Vincenty distance for each group of dataframe. dataframe is grouped on 2 columns , i.e. (userid,trip_id)
I can calculate vincenty distance for full dataframe by given statement
from geopy.distance import vincenty
df['lat_next'] = df['lat'].shift(-1)
df['long_next'] = df['long'].shift(-1)
df['Vincenty_distance'] = df.dropna().apply(lambda x: vincenty((x['lat'], x['long']), (x['lat_next'], x['long_next'])).meters, axis = 1)
df = df.drop(['lat_next','long_next'], axis=1)
I want to apply this function to each group, I try using this statement but got an error.
df['Vincenty_distance'] = df.dropna().groupby(['userid','trip_id']).apply(lambda x: vincenty((x['lat'], x['long']), (x['lat_next'], x['long_next'])).meters,axis=1)
I am expecting following result.
userid trip_id lat long Vincenty_distance
141.0 1.0 39.979547 116.306813 2.563812
141.0 1.0 39.979558 116.306823 2.956183
141.0 1.0 39.979575 116.306835 2.332577
141.0 1.0 39.979587 116.306847 Nan
141.0 2.0 39.979603 116.306852 2.334821
141.0 2.0 39.979612 116.306867 2.332577
141.0 2.0 39.979627 116.306877 1.695449
141.0 2.0 39.979635 116.306888 Nan
141.0 3.0 39.979645 116.306903 1.871784
141.0 3.0 39.979657 116.306913 1.982752
141.0 3.0 39.979670 116.306920 2.220685
141.0 3.0 39.979682 116.306920 Nan
In this article, let’s see how to apply functions in a group in a Pandas Dataframe. Steps to be followed for performing this task are – Import the necessary libraries. Set up the data as a Pandas DataFrame. Use apply function to find different statistical measures like Rolling Mean, Average, Sum, Maximum, and Minimum.
It provides with a huge amount of Classes and function which help in analyzing and manipulating data in an easier way. One can use apply () function in order to apply function to every row in given dataframe.
A function or formula to apply to each group. If a function, it is used as is. It should have at least 2 formal arguments. If a formula, e.g. ~ head (.x), it is converted to a function. .y to refer to the key, a one row tibble with one column per grouping variable that identifies the group ...
While we applied our custom function to each dataframe directly before, we will need to rewrite the function slightly to accept the groupby element as the argument. But first, create a groupby object for the column (s) you want to groupby and assign it a variable name.
I believe you need DataFrameGroupBy.shift
for shift per groups for next
columns first, so groupby
with vincenty
is not necessary:
df = df.join(df.groupby(['userid','trip_id'])[['lat','long']].shift(-1).add_suffix('_next'))
print (df)
userid trip_id lat long lat_next long_next
0 141.0 1.0 39.979547 116.306813 39.979558 116.306823
1 141.0 1.0 39.979558 116.306823 39.979575 116.306835
2 141.0 1.0 39.979575 116.306835 39.979587 116.306847
3 141.0 1.0 39.979587 116.306847 NaN NaN
4 141.0 2.0 39.979603 116.306852 39.979612 116.306867
5 141.0 2.0 39.979612 116.306867 39.979627 116.306877
6 141.0 2.0 39.979627 116.306877 39.979635 116.306888
7 141.0 2.0 39.979635 116.306888 NaN NaN
8 141.0 3.0 39.979645 116.306903 39.979657 116.306913
9 141.0 3.0 39.979657 116.306913 39.979670 116.306920
10 141.0 3.0 39.979670 116.306920 39.979682 116.306920
11 141.0 3.0 39.979682 116.306920 NaN NaN
f = lambda x: vincenty((x['lat'], x['long']), (x['lat_next'], x['long_next'])).meters
df['Vincenty_distance'] = df.dropna().apply(f, axis = 1)
df = df.drop(['lat_next','long_next'], axis=1)
print (df)
userid trip_id lat long Vincenty_distance
0 141.0 1.0 39.979547 116.306813 1.490437
1 141.0 1.0 39.979558 116.306823 2.147940
2 141.0 1.0 39.979575 116.306835 1.681071
3 141.0 1.0 39.979587 116.306847 NaN
4 141.0 2.0 39.979603 116.306852 1.624902
5 141.0 2.0 39.979612 116.306867 1.871784
6 141.0 2.0 39.979627 116.306877 1.293017
7 141.0 2.0 39.979635 116.306888 NaN
8 141.0 3.0 39.979645 116.306903 1.582706
9 141.0 3.0 39.979657 116.306913 1.562388
10 141.0 3.0 39.979670 116.306920 1.332411
11 141.0 3.0 39.979682 116.306920 NaN
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With