I'm using the dplyr package in R and have grouped my data by 3 variables (Year, Site, Brood).
I want to get rid of groups made up of less than 3 rows. For example in the following sample I would like to remove the rows for brood '2'. I have a lot of data to do this with so while I could painstakingly do it by hand it would be so helpful to automate it using R.
Year Site Brood Parents
1996 A 1 1
1996 A 1 1
1996 A 1 0
1996 A 1 0
1996 A 2 1
1996 A 2 0
1996 A 3 1
1996 A 3 1
1996 A 3 1
1996 A 3 0
1996 A 3 1
I hope this makes sense and thank you very much in advance for your help! I'm new to R and stackoverflow so apologies if the way I've worded this question isn't very good! Let me know if I need to provide any other information.
Use -c() with the row id you wanted to delete, Using this we can delete multiple rows at a time from the R data frame. Here row index numbers are specified inside vector c(). Example: In this example, we will delete multiple rows at a time.
To remove rows of data from a dataframe based on multiple conditional statements. We use square brackets [ ] with the dataframe and put multiple conditional statements along with AND or OR operator inside it. This slices the dataframe and removes all the rows that do not satisfy the given conditions.
To remove rows with an in R we can use the na. omit() and <code>drop_na()</code> (tidyr) functions.
One way to do it is to use the magic n()
function within filter
:
library(dplyr)
my_data <- data.frame(Year=1996, Site="A", Brood=c(1,1,2,2,2))
my_data %>%
group_by(Year, Site, Brood) %>%
filter(n() >= 3)
The n()
function gives the number of rows in the current group (or the number of rows total if there is no grouping).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With