Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to select last N observation from each group in dplyr dataframe?

Tags:

r

dplyr

Given a dataframe:

df <- structure(list(a = c(1, 1, 1, 2, 2, 2, 3, 3, 4, 4), b = c(34, 
343, 54, 11, 55, 62, 59, -9, 0, -0.5)), row.names = c(NA, -10L
), class = c("tbl_df", "tbl", "data.frame"))

I want to take last N observations / rows from each group:

df %>% 
dplyr::group_by(a) %>% 
dplyr::last(2)

Gives me wrong results.

I want it to be:

a   b
1 343
1  54
2  55
2  62
3  59
3  -9
4   0
4  -0.5

Please advise what is wrong here?

The error I get is:

Error in order(order_by)[[n]] : subscript out of bounds

like image 236
SteveS Avatar asked Jan 01 '19 09:01

SteveS


People also ask

How do I select the last observation in a group in R?

You can do that by using the function arrange from dplyr. 2. Use the dplyr filter function to get the first and the last row of each group.

How do I select the last n rows in R?

The last n rows of the data frame can be accessed by using the in-built tail() method in R. Supposedly, N is the total number of rows in the data frame, then n <=N last rows can be extracted from the structure.

What is group_by in dplyr?

The group_by() function in R is from dplyr package that is used to group rows by column values in the DataFrame, It is similar to GROUP BY clause in SQL. R dplyr groupby is used to collect identical data into groups on DataFrame and perform aggregate functions on the grouped data.

What dplyr function do you use to pick observations by their values?

There are five dplyr functions that you will use to do the vast majority of data manipulations: filter() : pick observations by their values. select() : pick variables by their names. mutate() : create new variables with functions of existing variables.


1 Answers

As it is a specific question based on dplyr

1) after the group_by, use slice on the row_number()

library(tidyverse)
df %>% 
   group_by(a) %>% 
   slice(tail(row_number(), 2))
# A tibble: 8 x 2
# Groups:   a [4]
#      a      b
#  <dbl>  <dbl>
#1     1  343  
#2     1   54  
#3     2   55  
#4     2   62  
#5     3   59  
#6     3   -9  
#7     4    0  
#8     4   -0.5

2) Or use filter from dplyr

df %>% 
   group_by(a) %>% 
   filter(row_number() >= (n() - 1))

3) or with do and tail

df %>%
    group_by(a) %>% 
    do(tail(., 2))

4) In addition to the tidyverse, methods, we can also use compact data.table

library(data.table)
setDT(df)[df[, .I[tail(seq_len(.N), 2)], a]$V1]

5) Or by from base R

by(df, df$a, FUN = tail, 2)

6) or with aggregate from base R

df[aggregate(c ~ a, transform(df, c = seq_len(nrow(df))), FUN = tail, 2)$c,]

7) or with split from base R

do.call(rbind, lapply(split(df, df$a), tail, 2))
like image 61
akrun Avatar answered Sep 23 '22 14:09

akrun