Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to apply function to each group of dataframe

How to apply function on groupby dataframe

Given the dataframe df.

userid   trip_id        lat         long
141.0      1.0      39.979547   116.306813
141.0      1.0      39.979558   116.306823
141.0      1.0      39.979575   116.306835
141.0      1.0      39.979587   116.306847
141.0      2.0      39.979603   116.306852
141.0      2.0      39.979612   116.306867
141.0      2.0      39.979627   116.306877
141.0      2.0      39.979635   116.306888
141.0      3.0      39.979645   116.306903
141.0      3.0      39.979657   116.306913
141.0      3.0      39.979670   116.306920
141.0      3.0      39.979682   116.306920

I want to calculate Vincenty distance for each group of dataframe. dataframe is grouped on 2 columns , i.e. (userid,trip_id)

I can calculate vincenty distance for full dataframe by given statement

from geopy.distance import vincenty
df['lat_next'] = df['lat'].shift(-1)
df['long_next'] = df['long'].shift(-1)
df['Vincenty_distance'] = df.dropna().apply(lambda x: vincenty((x['lat'], x['long']), (x['lat_next'], x['long_next'])).meters, axis = 1)
df = df.drop(['lat_next','long_next'], axis=1) 

I want to apply this function to each group, I try using this statement but got an error.

df['Vincenty_distance'] = df.dropna().groupby(['userid','trip_id']).apply(lambda x: vincenty((x['lat'], x['long']), (x['lat_next'], x['long_next'])).meters,axis=1)

I am expecting following result.

userid  trip_id        lat        long        Vincenty_distance
141.0      1.0      39.979547   116.306813         2.563812
141.0      1.0      39.979558   116.306823         2.956183
141.0      1.0      39.979575   116.306835         2.332577
141.0      1.0      39.979587   116.306847           Nan
141.0      2.0      39.979603   116.306852         2.334821
141.0      2.0      39.979612   116.306867         2.332577
141.0      2.0      39.979627   116.306877         1.695449
141.0      2.0      39.979635   116.306888           Nan
141.0      3.0      39.979645   116.306903          1.871784
141.0      3.0      39.979657   116.306913         1.982752
141.0      3.0      39.979670   116.306920         2.220685
141.0      3.0      39.979682   116.306920           Nan
like image 730
Asif Khan Avatar asked Nov 22 '18 05:11

Asif Khan


People also ask

How to apply functions in a group in a pandas Dataframe?

In this article, let’s see how to apply functions in a group in a Pandas Dataframe. Steps to be followed for performing this task are – Import the necessary libraries. Set up the data as a Pandas DataFrame. Use apply function to find different statistical measures like Rolling Mean, Average, Sum, Maximum, and Minimum.

How to apply a function to every row in a Dataframe?

It provides with a huge amount of Classes and function which help in analyzing and manipulating data in an easier way. One can use apply () function in order to apply function to every row in given dataframe.

How do you apply a function to a group?

A function or formula to apply to each group. If a function, it is used as is. It should have at least 2 formal arguments. If a formula, e.g. ~ head (.x), it is converted to a function. .y to refer to the key, a one row tibble with one column per grouping variable that identifies the group ...

How to group Dataframe columns using a custom function?

While we applied our custom function to each dataframe directly before, we will need to rewrite the function slightly to accept the groupby element as the argument. But first, create a groupby object for the column (s) you want to groupby and assign it a variable name.


Video Answer


1 Answers

I believe you need DataFrameGroupBy.shift for shift per groups for next columns first, so groupby with vincenty is not necessary:

df = df.join(df.groupby(['userid','trip_id'])[['lat','long']].shift(-1).add_suffix('_next'))
print (df)
    userid  trip_id        lat        long   lat_next   long_next
0    141.0      1.0  39.979547  116.306813  39.979558  116.306823
1    141.0      1.0  39.979558  116.306823  39.979575  116.306835
2    141.0      1.0  39.979575  116.306835  39.979587  116.306847
3    141.0      1.0  39.979587  116.306847        NaN         NaN
4    141.0      2.0  39.979603  116.306852  39.979612  116.306867
5    141.0      2.0  39.979612  116.306867  39.979627  116.306877
6    141.0      2.0  39.979627  116.306877  39.979635  116.306888
7    141.0      2.0  39.979635  116.306888        NaN         NaN
8    141.0      3.0  39.979645  116.306903  39.979657  116.306913
9    141.0      3.0  39.979657  116.306913  39.979670  116.306920
10   141.0      3.0  39.979670  116.306920  39.979682  116.306920
11   141.0      3.0  39.979682  116.306920        NaN         NaN

f = lambda x: vincenty((x['lat'], x['long']), (x['lat_next'], x['long_next'])).meters
df['Vincenty_distance'] = df.dropna().apply(f, axis = 1)
df = df.drop(['lat_next','long_next'], axis=1) 
print (df)
    userid  trip_id        lat        long  Vincenty_distance
0    141.0      1.0  39.979547  116.306813           1.490437
1    141.0      1.0  39.979558  116.306823           2.147940
2    141.0      1.0  39.979575  116.306835           1.681071
3    141.0      1.0  39.979587  116.306847                NaN
4    141.0      2.0  39.979603  116.306852           1.624902
5    141.0      2.0  39.979612  116.306867           1.871784
6    141.0      2.0  39.979627  116.306877           1.293017
7    141.0      2.0  39.979635  116.306888                NaN
8    141.0      3.0  39.979645  116.306903           1.582706
9    141.0      3.0  39.979657  116.306913           1.562388
10   141.0      3.0  39.979670  116.306920           1.332411
11   141.0      3.0  39.979682  116.306920                NaN
like image 107
jezrael Avatar answered Oct 18 '22 03:10

jezrael