With the <code>dat</code> below. How can I make a new dataframe subset that includes all values except the first five rows for each IndID? Said differently I want new data frame with the first 5 rows for each IndID excluded. <pre class="prettyprint"><code>set.seed(123) dat <- data.frame(IndID = rep(c("AAA", "BBB", "CCC", "DDD"), each = 10), Number = sample(1:100,40)) </code></pre> I have seen a number of SO posts that select data, but I am not sure how to remove as mentioned above.

We can use <code>dplyr</code>'s <code>slice()</code> functionality: <pre class="prettyprint"><code>dat %>% group_by(IndID) %>% slice(6:n()) </code></pre>

In base R, <code>tapply()</code> is handy when used on a sequence of row numbers with <code>tail()</code>. <pre class="prettyprint"><code>idx <- unlist(tapply(1:nrow(dat), dat$IndID, tail, -5)) dat[idx, ] </code></pre> Note that this will be more efficient with <code>use.names=FALSE</code> in <code>unlist()</code>. With data.table, you can do the following with <code>tail()</code>. <pre class="prettyprint"><code>library(data.table) setDT(dat)[dat[, tail(.I, -5), by=IndID]$V1] </code></pre>

Remove the first N rows from each factor level in an r data.frame

Tags:

r

greatest-n-per-group

dplyr

With the dat below. How can I make a new dataframe subset that includes all values except the first five rows for each IndID? Said differently I want new data frame with the first 5 rows for each IndID excluded.

set.seed(123)
dat <- data.frame(IndID = rep(c("AAA", "BBB", "CCC", "DDD"), each  = 10),
                  Number = sample(1:100,40))

I have seen a number of SO posts that select data, but I am not sure how to remove as mentioned above.

217

asked Feb 14 '17 23:02

B. Davis

2 Answers

We can use dplyr's slice() functionality:

dat %>% 
    group_by(IndID) %>% 
    slice(6:n())

131

answered Oct 15 '22 00:10

GGamba

In base R, tapply() is handy when used on a sequence of row numbers with tail().

idx <- unlist(tapply(1:nrow(dat), dat$IndID, tail, -5))
dat[idx, ]

Note that this will be more efficient with use.names=FALSE in unlist().

With data.table, you can do the following with tail().

library(data.table)

setDT(dat)[dat[, tail(.I, -5), by=IndID]$V1]

answered Oct 14 '22 23:10

Rich Scriven

Related questions
                            
                                summing multiple columns in an R data-frame quickly [duplicate]
                            
                                Remove duplicate element within a row in a specific column
                            
                                Coalesce pairs of variables within a dataframe based on a regular expression
                            
                                Perform 'cross product' of two vectors, but with addition
                            
                                ImageMagick in R
                            
                                How to rename specific variable of a data frame with setNames()?
                            
                                r keeping 0.0 when using paste or paste0
                            
                                How to visualize a map from a netcdf file?
                            
                                Removing NA in correlation matrix
                            
                                The difference between & and && in R
                            
                                R cumulative sum by condition with reset
                            
                                How to get names of dot-dot-dot arguments in R [duplicate]
                            
                                Sorting rows alphabetically
                            
                                Preventing R From Rounding
                            
                                Position-dodge warning with ggplot boxplot?
                            
                                How can I count the number of times a value occurs in a column of a dataframe?
                            
                                Split word in column in R
                            
                                Call by reference in R (using function to modify an object)
                            
                                How to add new calculated variables to a data frame
                            
                                Calculate using dplyr, percentage of NA'S in each column

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With