Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to calculate days passed from start date by group?

Tags:

r

aggregate

I need to go from this

 id  |    date
-----------------
  A  | 2000-01-13
  A  | 2000-01-18
  A  | 2000-01-25
  B  | 2012-10-10
  B  | 2012-10-11
  C  | 2005-07-25
  C  | 2005-07-31

to this

 id  |    date     | days from start
---------------------------
  A  | 2000-01-13  |  0
  A  | 2000-01-18  |  5
  A  | 2000-01-25  |  12
  A  | 2000-02-08  |  26
  B  | 2012-10-10  |  0
  B  | 2012-10-11  |  1
  C  | 2005-07-25  |  0
  C  | 2005-07-31  |  6

i.e. creating a variable that holds the number of days passed since the first date, grouped by id.

Any ideas?

like image 661
jenswirf Avatar asked Jan 21 '13 10:01

jenswirf


People also ask

How do I calculate days passed in Excel?

To find the number of days between these two dates, you can enter “=B2-B1” (without the quotes into cell B3). Once you hit enter, Excel will automatically calculate the number of days between the two dates entered.

How do you calculate the number of days from a date?

How do I go about calculating the days between two dates? To calculate the number of days between two dates, you need to subtract the start date from the end date.

How do I extract days between two dates?

Calculate elapsed time between two dates and times To calculate the time between two dates and times, you can simply subtract one from the other.


2 Answers

Using data.table: (I assume the date column is character here. If its date format, then you can remove the as.Date(.) function call.

df <- structure(list(id = c("A", "A", "A", "B", "B", "C", "C"), 
             date = c("2000-01-13", "2000-01-18", "2000-01-25", "2012-10-10", 
                    "2012-10-11", "2005-07-25", "2005-07-31")), 
             .Names = c("id", "date"), row.names = c(NA, -7L), 
             class = "data.frame")
require(data.table)
dt <- data.table(df, key="id")
dt[, days_from_start := cumsum(c(0, diff(as.Date(date)))),by=id]

#    id       date days_from_start
# 1:  A 2000-01-13               0
# 2:  A 2000-01-18               5
# 3:  A 2000-01-25              12
# 4:  B 2012-10-10               0
# 5:  B 2012-10-11               1
# 6:  C 2005-07-25               0
# 7:  C 2005-07-31               6
like image 92
Arun Avatar answered Oct 08 '22 05:10

Arun


You can also use a combination of functions difftime and split:

dat
  id       date
1  A 2000-01-13
2  A 2000-01-18
3  A 2000-01-25
4  B 2012-10-10
5  B 2012-10-11
6  C 2005-07-25
7  C 2005-07-31

dat$date <- as.POSIXct(dat$date)
dat$"Days spent" <- unlist(lapply(split(dat,f=dat$id),
                         function(x){as.numeric(difftime(x$date,x$date[1], units="days"))}))
dat
  id       date Days spent
1  A 2000-01-13          0
2  A 2000-01-18          5
3  A 2000-01-25         12
4  B 2012-10-10          0
5  B 2012-10-11          1
6  C 2005-07-25          0
7  C 2005-07-31          6

Following @agstudy and @Arun suggestions, this can be simplified as follows:

dat$"Days spent" <- unlist(by(dat, dat$id, 
                           function(x)difftime(x$date,x$date[1], units= "days")))
like image 22
plannapus Avatar answered Oct 08 '22 05:10

plannapus