Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas dataframe apply refer to previous row to calculate difference

I have the following pandas dataframe containing 2 columns (simplified). The first column contains player names and the second column contains dates (datetime objects):

  player    date
  A         2010-01-01
  A         2010-01-09
  A         2010-01-11
  A         2010-01-15
  B         2010-02-01
  B         2010-02-10
  B         2010-02-21
  B         2010-02-23

I want to add a column diff which represents the time difference in days per player. The result should look like this:

  player    date            diff
  A         2010-01-01      0
  A         2010-01-09      8
  A         2010-01-11      2
  A         2010-01-15      4
  B         2010-02-01      0
  B         2010-02-10      9
  B         2010-02-21      11
  B         2010-02-23      2

The first row has 0 for diff, because there is no earlier date. The second row shows 8, because the difference between 2010-01-01 and 2010-01-09 is eight days.

The problem is not calculating the day-difference between two datetime objects. I am just not sure on how to add the new column. I know, that I have to make a groupby first (df.groupby('player')) and then use apply (or maybe transform?). However, I am stuck, because for calculating the difference, I need to refer to the previous row in the apply-function, and I don't know how to do that, if possible at all.

Thank you very much.

UPDATE: After trying both proposed solutions below, I figured out that they did not work with my code. After much headache, I found out that my data had duplicate indices. So after I found out that I have duplicate indices, a simple df.reset_index() solved my issue and the proposed solutions worked. Since both solutions work, but I can only mark one as correct, I will choose the more concise/shorter solution. Thanks to both of you, though!

like image 983
beta Avatar asked Nov 01 '15 10:11

beta


People also ask

How do you find the difference between two rows in pandas?

During data analysis, one might need to compute the difference between two rows for comparison purposes. This can be done using pandas. DataFrame. diff() function.

How do I compare row values in pandas?

You can use the DataFrame. diff() function to find the difference between two rows in a pandas DataFrame. where: periods: The number of previous rows for calculating the difference.

Is ILOC () and LOC () functions are same?

loc and iloc are interchangeable when the labels of the DataFrame are 0-based integers.

How do you conditionally replace values in pandas?

You can replace values of all or selected columns based on the condition of pandas DataFrame by using DataFrame. loc[ ] property. The loc[] is used to access a group of rows and columns by label(s) or a boolean array. It can access and can also manipulate the values of pandas DataFrame.


2 Answers

You can simply write:

df['difference'] = df.groupby('player')['date'].diff().fillna(0)

This gives the new timedelta column with the correct values:

  player       date  difference
0      A 2010-01-01      0 days
1      A 2010-01-09      8 days
2      A 2010-01-11      2 days
3      A 2010-01-15      4 days
4      B 2010-02-01      0 days
5      B 2010-02-10      9 days
6      B 2010-02-21     11 days
7      B 2010-02-23      2 days

(I've used the name "difference" instead of "diff" to distinguish the name from the method diff.)

like image 61
Alex Riley Avatar answered Sep 21 '22 14:09

Alex Riley


another way if you want to implement it manually is to do the following

def date_diff(df):
    df['difference'] = df['date'] - df['date'].shift()
    df['difference'].fillna(0 ,inplace = True)
    return df

In [30]:
df_final = df.groupby(df['player']).apply(date_diff)
df_final
Out[30]:
player  date    difference
A   2010-01-01  0 days
A   2010-01-09  8 days
A   2010-01-11  2 days
A   2010-01-15  4 days
B   2010-02-01  0 days
B   2010-02-10  9 days
B   2010-02-21  11 days
B   2010-02-23  2 days
like image 25
Nader Hisham Avatar answered Sep 24 '22 14:09

Nader Hisham