Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to calculate day's difference between successive pandas dataframe rows with condition

Tags:

python

pandas

I have a pandas dataframe like following..

item_id        date
  101     2016-01-05
  101     2016-01-21
  121     2016-01-08
  121     2016-01-22
  128     2016-01-19
  128     2016-02-17
  131     2016-01-11
  131     2016-01-23
  131     2016-01-24
  131     2016-02-06
  131     2016-02-07

I want to calculate days difference between date column but with respect to item_id column. First I want to sort the dataframe with date grouping on item_id. It should look like this

item_id        date     
  101     2016-01-05         
  101     2016-01-08         
  121     2016-01-21         
  121     2016-01-22         
  128     2016-01-17         
  128     2016-02-19
  131     2016-01-11
  131     2016-01-23
  131     2016-01-24
  131     2016-02-06
  131     2016-02-07

Then I want to calculate the difference between dates again grouping on item_id So the output should look like following

 item_id        date      day_difference 
  101     2016-01-05          0
  101     2016-01-08          3
  121     2016-01-21          0
  121     2016-01-22          1
  128     2016-01-17          0
  128     2016-02-19          2
  131     2016-01-11          0
  131     2016-01-23          12
  131     2016-01-24          1
  131     2016-02-06          13 
  131     2016-02-07          1

For sorting I used something like this

df.groupby('item_id').apply(lambda x: new_df.sort('date'))

But,it didn't work out. I am able to calculate the difference between consecutive rows by following

(df['date'] - df['date'].shift(1))

But not for grouping with item_id

like image 938
Neil Avatar asked Feb 21 '16 08:02

Neil


People also ask

How do I compare time in pandas?

Comparison between pandas timestamp objects is carried out using simple comparison operators: >, <,==,< = , >=. The difference can be calculated using a simple '–' operator. Given time can be converted to pandas timestamp using pandas. Timestamp() method.

How do you subtract date and time in pandas?

When the function receives the date string it will first use the Pandas to_datetime() function to convert it to a Python datetime and it will then use the timedelta() function to subtract the number of days defined in the days variable.


1 Answers

I think you can use:

df['date'] = df.groupby('item_id')['date'].apply(lambda x: x.sort_values())

df['diff'] = df.groupby('item_id')['date'].diff() / np.timedelta64(1, 'D')
df['diff'] = df['diff'].fillna(0)
print df
    item_id       date  diff
0       101 2016-01-05     0
1       101 2016-01-21    16
2       121 2016-01-08     0
3       121 2016-01-22    14
4       128 2016-01-19     0
5       128 2016-02-17    29
6       131 2016-01-11     0
7       131 2016-01-23    12
8       131 2016-01-24     1
9       131 2016-02-06    13
10      131 2016-02-07     1
like image 72
jezrael Avatar answered Oct 05 '22 23:10

jezrael