Pandas dataframe apply refer to previous row to calculate difference

Tags:

I have the following pandas dataframe containing 2 columns (simplified). The first column contains player names and the second column contains dates (datetime objects):

  player    date
  A         2010-01-01
  A         2010-01-09
  A         2010-01-11
  A         2010-01-15
  B         2010-02-01
  B         2010-02-10
  B         2010-02-21
  B         2010-02-23

I want to add a column diff which represents the time difference in days per player. The result should look like this:

  player    date            diff
  A         2010-01-01      0
  A         2010-01-09      8
  A         2010-01-11      2
  A         2010-01-15      4
  B         2010-02-01      0
  B         2010-02-10      9
  B         2010-02-21      11
  B         2010-02-23      2

The first row has 0 for diff, because there is no earlier date. The second row shows 8, because the difference between 2010-01-01 and 2010-01-09 is eight days.

The problem is not calculating the day-difference between two datetime objects. I am just not sure on how to add the new column. I know, that I have to make a groupby first (df.groupby('player')) and then use apply (or maybe transform?). However, I am stuck, because for calculating the difference, I need to refer to the previous row in the apply-function, and I don't know how to do that, if possible at all.

Thank you very much.

UPDATE: After trying both proposed solutions below, I figured out that they did not work with my code. After much headache, I found out that my data had duplicate indices. So after I found out that I have duplicate indices, a simple df.reset_index() solved my issue and the proposed solutions worked. Since both solutions work, but I can only mark one as correct, I will choose the more concise/shorter solution. Thanks to both of you, though!

983

asked Nov 01 '15 10:11

beta

2 Answers

You can simply write:

df['difference'] = df.groupby('player')['date'].diff().fillna(0)

This gives the new timedelta column with the correct values:

  player       date  difference
0      A 2010-01-01      0 days
1      A 2010-01-09      8 days
2      A 2010-01-11      2 days
3      A 2010-01-15      4 days
4      B 2010-02-01      0 days
5      B 2010-02-10      9 days
6      B 2010-02-21     11 days
7      B 2010-02-23      2 days

(I've used the name "difference" instead of "diff" to distinguish the name from the method diff.)

answered Sep 21 '22 14:09

Alex Riley

another way if you want to implement it manually is to do the following

def date_diff(df):
    df['difference'] = df['date'] - df['date'].shift()
    df['difference'].fillna(0 ,inplace = True)
    return df

In [30]:
df_final = df.groupby(df['player']).apply(date_diff)
df_final
Out[30]:
player  date    difference
A   2010-01-01  0 days
A   2010-01-09  8 days
A   2010-01-11  2 days
A   2010-01-15  4 days
B   2010-02-01  0 days
B   2010-02-10  9 days
B   2010-02-21  11 days
B   2010-02-23  2 days

answered Sep 24 '22 14:09

Nader Hisham

Related questions
                            
                                WxPython: PyInstaller fails with No module named _core_
                            
                                Python PIL bitmap/png from array with mode=1
                            
                                "python manage.py runserver" vs "django-admin runserver"
                            
                                In Python, how can an image stored as a NumPy array be scaled in size?
                            
                                Scikit Learn - Calculating TF-IDF from a corpus of arrays of features instead of from a corpus of raw documents
                            
                                Import error no module named zlib (brew installed python)
                            
                                Python. How to get the x,y coordinates of a offset spline from a x,y list of points and offset distance
                            
                                Django override bulk_create
                            
                                Python: Assertion error, "not called"
                            
                                OpenCV's waitKey() alternative in IPython Notebook
                            
                                Psycopg2 - AttributeError: 'NoneType' object has no attribute 'fetchall'
                            
                                Querying Pandas DataFrame with column name that contains a space or using the drop method with a column name that contains a space
                            
                                An elegant way to make a 2d array with all possible columns
                            
                                how do I commit and push to github from python shell?
                            
                                In python, can you pass variadic arguments after named parameters?
                            
                                Preserve empty lines with NLTK's Punkt Tokenizer
                            
                                python pandas dataframe : removing selected rows
                            
                                Remove rotation effect when drawing a square grid of MxM nodes in networkx using grid_2d_graph
                            
                                How to get extended MacOS attributes of a file using python?
                            
                                Increase tkSimpleDialog window size

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas dataframe apply refer to previous row to calculate difference

Tags:

python

pandas

dataframe

apply

beta

People also ask

2 Answers

Alex Riley

Nader Hisham

Recent Activity

Donate For Us