Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Flag duplicates in R [duplicate]

Tags:

r

I have the following dataset:

dataset <- data.frame(id = c("A","A","A","A","B","B","B,"B"),
                      value = c(1,1,2,3,5,6,6,7))

For every id that is duplicated, I want to flag the row where it happens, and this flag should be the same length of the dataframe source. This is the expected result:

id    value    flag
A     1        1
A     1        1
A     2        0
A     3        0
B     5        0
B     6        1
B     6        1
B     7        0

Is there a way where I don't have to use a for loop? Any help will be greatly appreciated.

like image 269
Manu Avatar asked Dec 30 '22 21:12

Manu


1 Answers

We can use duplicated with and without fromLast = TRUE to mark all the values that are repeated as 1.

dataset$flag <- as.integer(duplicated(dataset$value) | 
                           duplicated(dataset$value, fromLast = TRUE))
dataset

#  id value flag
#1  A     1    1
#2  A     1    1
#3  A     2    0
#4  A     3    0
#5  B     5    0
#6  B     6    1
#7  B     6    1
#8  B     7    0
like image 187
Ronak Shah Avatar answered Jan 18 '23 19:01

Ronak Shah