I have some data like this:
sample.data <- rbind(data.table(start.date=seq(from=as.Date("2010-01-01"), to=as.Date("2014-12-01"), by="quarter"),
Group=c("A","B","C","D"), rnorm(20, 5)),
data.table(start.date=seq(from=as.Date("2010-01-01"), to=as.Date("2014-12-01"), by="quarter"),
Group=c("A","B","C","D"), rnorm(20, 3))
)
I would like to create an end.date
column that equals the next earliest start.date
value for each group.
So, for example, the first start.date
for Group==A
is 2010-01-01
. The next earliest start.date
for Group==A
is 2011-01-01
. So the final result should look like this when sorted by Group
:
start.date Group end.date
2010-01-01 A 2011-01-01
2010-01-01 A 2011-01-01
2011-01-01 A 2012-01-01
2011-01-01 A 2012-01-01
2012-01-01 A 2013-01-01
2012-01-01 A 2013-01-01
2013-01-01 A 2014-01-01
2013-01-01 A 2014-01-01
2014-01-01 A NA
2014-01-01 A NA
2010-04-01 B 2011-04-01
2010-04-01 B 2011-04-01
2011-04-01 B 2012-04-01
2011-04-01 B 2012-04-01
And so on. Ideally I would like to do this by reference, like
sample.data[, end.date := EXPRESSION]
but am at a loss for where to start. Thanks for any help.
Okay so:
events = unique(sample.data[ , .(Group, start.date) ])
events[, next.date := shift(start.date, type="lead"), by=Group]
sample.data[events, on=c("Group", "start.date"), end.date := next.date ]
In my opinion, the OP should have a table like events
anyways, in keeping with database design / tidy data. The result looks like
> sample.data[ order(Group, start.date) ]
start.date Group end.date
1: 2010-01-01 A 2011-01-01
2: 2010-01-01 A 2011-01-01
3: 2011-01-01 A 2012-01-01
4: 2011-01-01 A 2012-01-01
5: 2012-01-01 A 2013-01-01
6: 2012-01-01 A 2013-01-01
7: 2013-01-01 A 2014-01-01
8: 2013-01-01 A 2014-01-01
9: 2014-01-01 A <NA>
10: 2014-01-01 A <NA>
11: 2010-04-01 B 2011-04-01
12: 2010-04-01 B 2011-04-01
...
One option would be to make use of the dplyr
workflow:
require(dplyr); require(magrittr)
sample.data %<>%
group_by(Group) %>%
mutate(end.date = sort(start.date, decreasing = FALSE)[2]) %>%
You can manipulate sort
function and [n]
value to get second smallest, highest or any other date from within the group.
Following the discussion in comments.
sample.data %<>%
arrange(Group, start.date) %>%
group_by(Group) %>%
mutate(end.date2 = sort(start.date, decreasing = FALSE)[row_number(Group) + 2]) %>%
arrange(Group)
>> head(sample.data, n = 4)
Source: local data frame [4 x 4]
Groups: Group [1]
start.date Group V3 end.date2
<date> <chr> <dbl> <date>
1 2010-01-01 A 4.899328 2011-01-01
2 2010-01-01 A 3.451904 2011-01-01
3 2011-01-01 A 5.779825 2012-01-01
4 2011-01-01 A 4.182594 2012-01-01
1) Grouping by Group
, for each element of start.date
in the current group, find its position in sort(unique(start.date))
and return the value at the next position:
sample.data[, end.date := {u <- sort(unique(start.date)); u[match(start.date, u) + 1]},
by = Group]
2) Using ave
, the same approach also works without any packages:
transform(sample.data, end.date = ave(start.date, Group, FUN =
function(x) { u <- unique(sort(x)); u[match(x, u) + 1] }))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With