Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find next date in series by group

Tags:

r

data.table

I have some data like this:

sample.data <- rbind(data.table(start.date=seq(from=as.Date("2010-01-01"), to=as.Date("2014-12-01"), by="quarter"),
                 Group=c("A","B","C","D"), rnorm(20, 5)),
                 data.table(start.date=seq(from=as.Date("2010-01-01"), to=as.Date("2014-12-01"), by="quarter"),
                 Group=c("A","B","C","D"), rnorm(20, 3))
                 )

I would like to create an end.date column that equals the next earliest start.date value for each group.

So, for example, the first start.date for Group==A is 2010-01-01. The next earliest start.date for Group==A is 2011-01-01. So the final result should look like this when sorted by Group:

                start.date Group   end.date
                2010-01-01     A 2011-01-01
                2010-01-01     A 2011-01-01
                2011-01-01     A 2012-01-01
                2011-01-01     A 2012-01-01
                2012-01-01     A 2013-01-01
                2012-01-01     A 2013-01-01
                2013-01-01     A 2014-01-01
                2013-01-01     A 2014-01-01
                2014-01-01     A         NA
                2014-01-01     A         NA
                2010-04-01     B 2011-04-01
                2010-04-01     B 2011-04-01
                2011-04-01     B 2012-04-01
                2011-04-01     B 2012-04-01

And so on. Ideally I would like to do this by reference, like

sample.data[, end.date := EXPRESSION]

but am at a loss for where to start. Thanks for any help.

like image 274
moman822 Avatar asked Oct 28 '16 17:10

moman822


Video Answer


3 Answers

Okay so:

events = unique(sample.data[ , .(Group, start.date) ])
events[, next.date := shift(start.date, type="lead"), by=Group]

sample.data[events, on=c("Group", "start.date"), end.date := next.date ]

In my opinion, the OP should have a table like events anyways, in keeping with database design / tidy data. The result looks like

> sample.data[ order(Group, start.date) ]

    start.date Group   end.date
 1: 2010-01-01     A 2011-01-01
 2: 2010-01-01     A 2011-01-01
 3: 2011-01-01     A 2012-01-01
 4: 2011-01-01     A 2012-01-01
 5: 2012-01-01     A 2013-01-01
 6: 2012-01-01     A 2013-01-01
 7: 2013-01-01     A 2014-01-01
 8: 2013-01-01     A 2014-01-01
 9: 2014-01-01     A       <NA>
10: 2014-01-01     A       <NA>
11: 2010-04-01     B 2011-04-01
12: 2010-04-01     B 2011-04-01
...
like image 74
Frank Avatar answered Oct 12 '22 23:10

Frank


Initial approach

One option would be to make use of the dplyr workflow:

require(dplyr); require(magrittr)
sample.data %<>%
    group_by(Group) %>% 
    mutate(end.date = sort(start.date, decreasing = FALSE)[2]) %>%

Comments

You can manipulate sort function and [n] value to get second smallest, highest or any other date from within the group.

Alternative approach

Following the discussion in comments.

sample.data %<>%
    arrange(Group, start.date) %>%
    group_by(Group) %>%
    mutate(end.date2 = sort(start.date, decreasing = FALSE)[row_number(Group) + 2]) %>% 
    arrange(Group)

Preview

    >> head(sample.data, n = 4)
Source: local data frame [4 x 4]
Groups: Group [1]

  start.date Group       V3  end.date2
      <date> <chr>    <dbl>     <date>
1 2010-01-01     A 4.899328 2011-01-01
2 2010-01-01     A 3.451904 2011-01-01
3 2011-01-01     A 5.779825 2012-01-01
4 2011-01-01     A 4.182594 2012-01-01
like image 41
Konrad Avatar answered Oct 12 '22 23:10

Konrad


1) Grouping by Group, for each element of start.date in the current group, find its position in sort(unique(start.date)) and return the value at the next position:

sample.data[, end.date := {u <- sort(unique(start.date)); u[match(start.date, u) + 1]}, 
  by = Group]

2) Using ave, the same approach also works without any packages:

transform(sample.data, end.date = ave(start.date, Group, FUN = 
   function(x) { u <- unique(sort(x)); u[match(x, u) + 1] }))
like image 38
G. Grothendieck Avatar answered Oct 12 '22 22:10

G. Grothendieck