Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Subset to remove duplicate id and condition

If this is my dataset

Id   Weight   Category
1    10.2     Pre
1    12.1     Post
2    11.3     Post
3    12.9     Pre
4    10.3     Post
4    12.3     Pre
5    11.8     Pre

How Do I get rid of duplicate IDs that are also Category=Pre. My final expected dataset would be

Id   Weight   Category

1    12.1     Post
2    11.3     Post
3    12.9     Pre
4    10.3     Post
5    11.8     Pre
like image 646
bison2178 Avatar asked Dec 30 '25 16:12

bison2178


2 Answers

You may arrange the data and then use distinct.

library(dplyr)

df %>% arrange(Id, Category) %>% distinct(Id, .keep_all = TRUE)

#  Id Weight Category
#1  1   12.1     Post
#2  2   11.3     Post
#3  3   12.9      Pre
#4  4   10.3     Post
#5  5   11.8      Pre

This works because 'Pre' > 'Post'.

like image 122
Ronak Shah Avatar answered Jan 02 '26 05:01

Ronak Shah


Using by, split dat by Id and select Post, then rbind result.

do.call(rbind, by(dat, dat$Id, function(x) 
  if (nrow(x) == 2)  x[x$Category == 'Post', ] else x))
#   Id Weight Category
# 1  1   12.1     Post
# 2  2   11.3     Post
# 3  3   12.9      Pre
# 4  4   10.3     Post
# 5  5   11.8      Pre

Data:

dat <- read.table(header=T, text='
                  Id   Weight   Category
1    10.2     Pre
1    12.1     Post
2    11.3     Post
3    12.9     Pre
4    10.3     Post
4    12.3     Pre
5    11.8     Pre
                  ')
like image 23
jay.sf Avatar answered Jan 02 '26 04:01

jay.sf



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!