Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove the first N rows from each factor level in an r data.frame

With the dat below. How can I make a new dataframe subset that includes all values except the first five rows for each IndID? Said differently I want new data frame with the first 5 rows for each IndID excluded.

set.seed(123)
dat <- data.frame(IndID = rep(c("AAA", "BBB", "CCC", "DDD"), each  = 10),
                  Number = sample(1:100,40))

I have seen a number of SO posts that select data, but I am not sure how to remove as mentioned above.

like image 217
B. Davis Avatar asked Feb 14 '17 23:02

B. Davis


People also ask

How do I remove first n rows in R?

To remove first few rows from each group in R, we can use slice function of dplyr package after grouping with group_by function.

How do I remove the first few rows from a Dataframe?

In this article, we will discuss different ways to delete first row of a pandas dataframe in python. Use iloc to drop first row of pandas dataframe. Use drop() to remove first row of pandas dataframe. Use tail() function to remove first row of pandas dataframe.

How do I remove rows from a specific value in R?

To remove rows with an in R we can use the na. omit() and <code>drop_na()</code> (tidyr) functions. For example, na. omit(YourDataframe) will drop all rows with an.


2 Answers

We can use dplyr's slice() functionality:

dat %>% 
    group_by(IndID) %>% 
    slice(6:n())
like image 131
GGamba Avatar answered Oct 15 '22 00:10

GGamba


In base R, tapply() is handy when used on a sequence of row numbers with tail().

idx <- unlist(tapply(1:nrow(dat), dat$IndID, tail, -5))
dat[idx, ]

Note that this will be more efficient with use.names=FALSE in unlist().

With data.table, you can do the following with tail().

library(data.table)

setDT(dat)[dat[, tail(.I, -5), by=IndID]$V1]
like image 39
Rich Scriven Avatar answered Oct 14 '22 23:10

Rich Scriven