Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to delete groups containing less than 3 rows of data in R? [duplicate]

Tags:

r

rows

I'm using the dplyr package in R and have grouped my data by 3 variables (Year, Site, Brood).

I want to get rid of groups made up of less than 3 rows. For example in the following sample I would like to remove the rows for brood '2'. I have a lot of data to do this with so while I could painstakingly do it by hand it would be so helpful to automate it using R.

Year Site Brood Parents
1996 A    1     1  
1996 A    1     1  
1996 A    1     0  
1996 A    1     0  
1996 A    2     1      
1996 A    2     0  
1996 A    3     1  
1996 A    3     1  
1996 A    3     1  
1996 A    3     0  
1996 A    3     1  

I hope this makes sense and thank you very much in advance for your help! I'm new to R and stackoverflow so apologies if the way I've worded this question isn't very good! Let me know if I need to provide any other information.

like image 870
Keeley Seymour Avatar asked Feb 08 '16 14:02

Keeley Seymour


People also ask

How do you remove a group of rows in R?

Use -c() with the row id you wanted to delete, Using this we can delete multiple rows at a time from the R data frame. Here row index numbers are specified inside vector c(). Example: In this example, we will delete multiple rows at a time.

How do I remove rows from multiple conditions in R?

To remove rows of data from a dataframe based on multiple conditional statements. We use square brackets [ ] with the dataframe and put multiple conditional statements along with AND or OR operator inside it. This slices the dataframe and removes all the rows that do not satisfy the given conditions.

How do you delete two rows in R?

To remove rows with an in R we can use the na. omit() and <code>drop_na()</code> (tidyr) functions.


1 Answers

One way to do it is to use the magic n() function within filter:

library(dplyr)

my_data <- data.frame(Year=1996, Site="A", Brood=c(1,1,2,2,2))

my_data %>% 
  group_by(Year, Site, Brood) %>% 
  filter(n() >= 3)

The n() function gives the number of rows in the current group (or the number of rows total if there is no grouping).

like image 60
drhagen Avatar answered Oct 03 '22 07:10

drhagen