Given a dataframe: <pre class="prettyprint"><code>df <- structure(list(a = c(1, 1, 1, 2, 2, 2, 3, 3, 4, 4), b = c(34, 343, 54, 11, 55, 62, 59, -9, 0, -0.5)), row.names = c(NA, -10L ), class = c("tbl_df", "tbl", "data.frame")) </code></pre> I want to take last N observations / rows from each group: <pre class="prettyprint"><code>df %>% dplyr::group_by(a) %>% dplyr::last(2) </code></pre> Gives me wrong results. I want it to be: <pre class="prettyprint"><code>a b 1 343 1 54 2 55 2 62 3 59 3 -9 4 0 4 -0.5 </code></pre> Please advise what is wrong here? The error I get is: <blockquote> Error in order(order_by)[[n]] : subscript out of bounds </blockquote>

As it is a specific question based on <code>dplyr</code> 1) after the <code>group_by</code>, use <code>slice</code> on the <code>row_number()</code> <pre class="prettyprint"><code>library(tidyverse) df %>% group_by(a) %>% slice(tail(row_number(), 2)) # A tibble: 8 x 2 # Groups: a [4] # a b # <dbl> <dbl> #1 1 343 #2 1 54 #3 2 55 #4 2 62 #5 3 59 #6 3 -9 #7 4 0 #8 4 -0.5 </code></pre> <hr> 2) Or use <code>filter</code> from <code>dplyr</code> <pre class="prettyprint"><code>df %>% group_by(a) %>% filter(row_number() >= (n() - 1)) </code></pre> <hr> 3) or with <code>do</code> and <code>tail</code> <pre class="prettyprint"><code>df %>% group_by(a) %>% do(tail(., 2)) </code></pre> <hr> 4) In addition to the <code>tidyverse</code>, methods, we can also use compact <code>data.table</code> <pre class="prettyprint"><code>library(data.table) setDT(df)[df[, .I[tail(seq_len(.N), 2)], a]$V1] </code></pre> <hr> 5) Or <code>by</code> from <code>base R</code> <pre class="prettyprint"><code>by(df, df$a, FUN = tail, 2) </code></pre> 6) or with <code>aggregate</code> from <code>base R</code> <pre class="prettyprint"><code>df[aggregate(c ~ a, transform(df, c = seq_len(nrow(df))), FUN = tail, 2)$c,] </code></pre> 7) or with <code>split</code> from <code>base R</code> <pre class="prettyprint"><code>do.call(rbind, lapply(split(df, df$a), tail, 2)) </code></pre>

How to select last N observation from each group in dplyr dataframe?

Q: What is group_by in dplyr?

The group_by() function in R is from dplyr package that is used to group rows by column values in the DataFrame, It is similar to GROUP BY clause in SQL. R dplyr groupby is used to collect identical data into groups on DataFrame and perform aggregate functions on the grouped data.

Q: What dplyr function do you use to pick observations by their values?

There are five dplyr functions that you will use to do the vast majority of data manipulations: filter() : pick observations by their values. select() : pick variables by their names. mutate() : create new variables with functions of existing variables.

Tags:

r

dplyr

Given a dataframe:

df <- structure(list(a = c(1, 1, 1, 2, 2, 2, 3, 3, 4, 4), b = c(34, 
343, 54, 11, 55, 62, 59, -9, 0, -0.5)), row.names = c(NA, -10L
), class = c("tbl_df", "tbl", "data.frame"))

I want to take last N observations / rows from each group:

df %>% 
dplyr::group_by(a) %>% 
dplyr::last(2)

Gives me wrong results.

I want it to be:

Please advise what is wrong here?

The error I get is:

Error in order(order_by)[[n]] : subscript out of bounds

236

asked Jan 01 '19 09:01

SteveS

1 Answers

As it is a specific question based on dplyr

1) after the group_by, use slice on the row_number()

library(tidyverse)
df %>% 
   group_by(a) %>% 
   slice(tail(row_number(), 2))
# A tibble: 8 x 2
# Groups:   a [4]
#      a      b
#  <dbl>  <dbl>
#1     1  343  
#2     1   54  
#3     2   55  
#4     2   62  
#5     3   59  
#6     3   -9  
#7     4    0  
#8     4   -0.5

2) Or use filter from dplyr

df %>% 
   group_by(a) %>% 
   filter(row_number() >= (n() - 1))

3) or with do and tail

df %>%
    group_by(a) %>% 
    do(tail(., 2))

4) In addition to the tidyverse, methods, we can also use compact data.table

library(data.table)
setDT(df)[df[, .I[tail(seq_len(.N), 2)], a]$V1]

5) Or by from base R

by(df, df$a, FUN = tail, 2)

6) or with aggregate from base R

df[aggregate(c ~ a, transform(df, c = seq_len(nrow(df))), FUN = tail, 2)$c,]

7) or with split from base R

do.call(rbind, lapply(split(df, df$a), tail, 2))

answered Sep 23 '22 14:09

akrun

Related questions
                            
                                Removing x-axis label from dendrogram in r
                            
                                R how many element satisfy a condition?
                            
                                Boxplot of table using ggplot2
                            
                                Find consecutive sequence of zeros in R
                            
                                Add a new column between other dataframe columns [duplicate]
                            
                                Formatting of persp3d plot
                            
                                Calculating Time Difference between two columns
                            
                                stringr str_extract capture group capturing everything
                            
                                R: Sample a vector with replacement multiple times
                            
                                Too few periods for decompose() [closed]
                            
                                Removing leading zeros from alphanumeric characters in R
                            
                                How to make gradient color filled timeseries plot in R
                            
                                using leaflet library to output multiple popup values
                            
                                "RTextTools" create_matrix got an error
                            
                                Improving model training speed in caret (R)
                            
                                Interpretation of ordered and non-ordered factors, vs. numerical predictors in model summary
                            
                                R Extract day from datetime
                            
                                dim(X) must have a positive length when applying function in data frame
                            
                                How to remove duplicated (by name) column in data.tables in R?
                            
                                Conditionally selecting columns in dplyr where certain proportion of values is NA

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With