I'm using the dplyr package in R and have grouped my data by 3 variables (Year, Site, Brood). I want to get rid of groups made up of less than 3 rows. For example in the following sample I would like to remove the rows for brood '2'. I have a lot of data to do this with so while I could painstakingly do it by hand it would be so helpful to automate it using R. <pre class="prettyprint"><code>Year Site Brood Parents 1996 A 1 1 1996 A 1 1 1996 A 1 0 1996 A 1 0 1996 A 2 1 1996 A 2 0 1996 A 3 1 1996 A 3 1 1996 A 3 1 1996 A 3 0 1996 A 3 1 </code></pre> I hope this makes sense and thank you very much in advance for your help! I'm new to R and stackoverflow so apologies if the way I've worded this question isn't very good! Let me know if I need to provide any other information.

One way to do it is to use the magic <code>n()</code> function within <code>filter</code>: <pre class="prettyprint"><code>library(dplyr) my_data <- data.frame(Year=1996, Site="A", Brood=c(1,1,2,2,2)) my_data %>% group_by(Year, Site, Brood) %>% filter(n() >= 3) </code></pre> The <code>n()</code> function gives the number of rows in the current group (or the number of rows total if there is no grouping).

How to delete groups containing less than 3 rows of data in R? [duplicate]

Tags:

r

rows

I'm using the dplyr package in R and have grouped my data by 3 variables (Year, Site, Brood).

I want to get rid of groups made up of less than 3 rows. For example in the following sample I would like to remove the rows for brood '2'. I have a lot of data to do this with so while I could painstakingly do it by hand it would be so helpful to automate it using R.

Year Site Brood Parents
1996 A    1     1  
1996 A    1     1  
1996 A    1     0  
1996 A    1     0  
1996 A    2     1      
1996 A    2     0  
1996 A    3     1  
1996 A    3     1  
1996 A    3     1  
1996 A    3     0  
1996 A    3     1

I hope this makes sense and thank you very much in advance for your help! I'm new to R and stackoverflow so apologies if the way I've worded this question isn't very good! Let me know if I need to provide any other information.

870

asked Feb 08 '16 14:02

Keeley Seymour

1 Answers

One way to do it is to use the magic n() function within filter:

library(dplyr)

my_data <- data.frame(Year=1996, Site="A", Brood=c(1,1,2,2,2))

my_data %>% 
  group_by(Year, Site, Brood) %>% 
  filter(n() >= 3)

The n() function gives the number of rows in the current group (or the number of rows total if there is no grouping).

answered Oct 03 '22 07:10

drhagen

Related questions
                            
                                Is it good practice to update R packages often? [closed]
                            
                                How to replicate a Monthly Cycle Chart in R
                            
                                In R, how to use a "null" default value for an argument of a function?
                            
                                grep() to search column names of a dataframe
                            
                                Error installing 'topicmodels' package, non zero exit status; Ubuntu
                            
                                R - Error in UseMethod("groups") : no applicable method for 'groups' applied to an object of class "character"
                            
                                R: How can I use apply on rows of a data.frame and get out $column_name?
                            
                                Examining contents of .rdata file by attaching into a new environment - possible?
                            
                                match two columns with two other columns
                            
                                Remove rows in dataframe with factor ""
                            
                                R can't convert NaN to NA
                            
                                Converting a factor with 2 levels to binary values 0/1 in R [closed]
                            
                                R list get first item of each element
                            
                                Calculate Percentage Change in R using dplyr
                            
                                How to name the list of the group_split output in dplyr
                            
                                How can I revise my code to improve my processing speed
                            
                                Replace all values in a data.table given a condition
                            
                                removing a list of columns from a data.frame using subset [duplicate]
                            
                                How to save a graph as an a4 size pdf file under windows system? (R; ggplot2)
                            
                                R: adding 1 month to a date

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With