Situation
I have a data frame df
:
df <- structure(list(person = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L,
3L, 3L), .Label = c("pA", "pB", "pC"), class = "factor"), date = structure(c(16071,
16102, 16130, 16161, 16071, 16102, 16130, 16071, 16102), class = "Date")), .Names = c("person",
"date"), row.names = c(NA, -9L), class = "data.frame")
> df
person date
1 pA 2014-01-01
2 pA 2014-02-01
3 pA 2014-03-01
4 pA 2014-04-01
5 pB 2014-01-01
6 pB 2014-02-01
7 pB 2014-03-01
8 pC 2014-01-01
9 pC 2014-02-01
Question
How can I select the last 2 (or 'n') entries, ordered by date, for each person, so that I have a resulting data frame df1
:
> df1
person date
1 pA 2014-03-01
2 pA 2014-04-01
3 pB 2014-02-01
4 pB 2014-03-01
5 pC 2014-01-01
6 pC 2014-02-01
?
I've tried combinations of
library(dplyr)
df1 <- df %>%
group_by(person) %>%
select(tail(df, 2))
with no joy.
You can try slice
library(dplyr)
df %>%
group_by(person) %>%
arrange(date, person) %>%
slice((n()-1):n())
# person date
#1 pA 2014-03-01
#2 pA 2014-04-01
#3 pB 2014-02-01
#4 pB 2014-03-01
#5 pC 2014-01-01
#6 pC 2014-02-01
Or in place of the last step
do(tail(., 2))
Using data.table
:
setDT(df)[order(person), tail(.SD, 2L), by=person]
# person date
# 1: pA 2014-03-01
# 2: pA 2014-04-01
# 3: pB 2014-02-01
# 4: pB 2014-03-01
# 5: pC 2014-01-01
# 6: pC 2014-02-01
We order by person
and then group by person
and select the last two rows from the subset of data .SD
for each group.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With