Given a dataframe:
df <- structure(list(a = c(1, 1, 1, 2, 2, 2, 3, 3, 4, 4), b = c(34,
343, 54, 11, 55, 62, 59, -9, 0, -0.5)), row.names = c(NA, -10L
), class = c("tbl_df", "tbl", "data.frame"))
I want to take last N observations / rows from each group:
df %>%
dplyr::group_by(a) %>%
dplyr::last(2)
Gives me wrong results.
I want it to be:
a b
1 343
1 54
2 55
2 62
3 59
3 -9
4 0
4 -0.5
Please advise what is wrong here?
The error I get is:
Error in order(order_by)[[n]] : subscript out of bounds
You can do that by using the function arrange from dplyr. 2. Use the dplyr filter function to get the first and the last row of each group.
The last n rows of the data frame can be accessed by using the in-built tail() method in R. Supposedly, N is the total number of rows in the data frame, then n <=N last rows can be extracted from the structure.
The group_by() function in R is from dplyr package that is used to group rows by column values in the DataFrame, It is similar to GROUP BY clause in SQL. R dplyr groupby is used to collect identical data into groups on DataFrame and perform aggregate functions on the grouped data.
There are five dplyr functions that you will use to do the vast majority of data manipulations: filter() : pick observations by their values. select() : pick variables by their names. mutate() : create new variables with functions of existing variables.
As it is a specific question based on dplyr
1) after the group_by
, use slice
on the row_number()
library(tidyverse)
df %>%
group_by(a) %>%
slice(tail(row_number(), 2))
# A tibble: 8 x 2
# Groups: a [4]
# a b
# <dbl> <dbl>
#1 1 343
#2 1 54
#3 2 55
#4 2 62
#5 3 59
#6 3 -9
#7 4 0
#8 4 -0.5
2) Or use filter
from dplyr
df %>%
group_by(a) %>%
filter(row_number() >= (n() - 1))
3) or with do
and tail
df %>%
group_by(a) %>%
do(tail(., 2))
4) In addition to the tidyverse
, methods, we can also use compact data.table
library(data.table)
setDT(df)[df[, .I[tail(seq_len(.N), 2)], a]$V1]
5) Or by
from base R
by(df, df$a, FUN = tail, 2)
6) or with aggregate
from base R
df[aggregate(c ~ a, transform(df, c = seq_len(nrow(df))), FUN = tail, 2)$c,]
7) or with split
from base R
do.call(rbind, lapply(split(df, df$a), tail, 2))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With