Hello everyone I ould need help in order to remove duplicate rows from a df only when a column is higher than a threshold.
Here is a dataframe :
Group Species Values
1 G1 Cattus_cattus 10
2 G1 Cattus_cattus 10
3 G1 Cattus_cattus 10
4 G2 Canis_lupus 2
5 G2 Canis_lupus 2
6 G3 Griseus_lupa 90
7 G4 Griseus_lupa 89
I would liek to remove duplicated c(Group,Species)
when Values>5
Here I should then get :
Group Species Values
1 G1 Cattus_cattus 10
4 G2 Canis_lupus 2
5 G2 Canis_lupus 2
6 G3 Griseus_lupa 90
7 G4 Griseus_lupa 89
the data
structure(list(Group = structure(c(1L, 1L, 1L, 2L, 2L, 3L, 4L
), .Label = c("G1", "G2", "G3", "G4"), class = "factor"), Species = structure(c(2L,
2L, 2L, 1L, 1L, 3L, 3L), .Label = c("Canis_lupus", "Cattus_cattus",
"Griseus_lupa"), class = "factor"), Values = c(10L, 10L, 10L,
2L, 2L, 90L, 89L)), class = "data.frame", row.names = c(NA, -7L
))
You can use duplicated
and combine it with an or |
testing for x$Values < 5
.
x[!duplicated(x) | x$Values <= 5,]
#x[!(duplicated(x) & x$Values > 5),] #Alternative
# Group Species Values
#1 G1 Cattus_cattus 10
#4 G2 Canis_lupus 2
#5 G2 Canis_lupus 2
#6 G3 Griseus_lupa 90
#7 G4 Griseus_lupa 89
Or only for Group and Species:
x[!(duplicated(x[c("Group","Species")]) & x$Values > 5),]
Using dplyr
library(dplyr)
x %>%
filter(!duplicated(x)| Values <=5)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With