Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Increment by one to each duplicate value

I am trying to find a proper way, in R, to find duplicated values, and add the value 1 to each subsequent duplicated value grouped by id. For example:

data = data.table(id = c('1','1','1','1','1','2','2','2'),
                  value = c(95,100,101,101,101,20,35,38))

data$new_value <- ifelse(data[ , data$value] == lag(data$value,1),
                         lag(data$value, 1) + 1 ,data$value)
data$desired_value <- c(95,100,101,102,103,20,35,38)

Produces:

   id value new_value desired_value
1:  1    95        NA            95
2:  1   100       100           100
3:  1   101       101           101 # first 101 in id 1: add 0
4:  1   101       102           102 # second 101 in id 1: add 1
5:  1   101       102           103 # third 101 in id 1: add 2
6:  2    20        20            20
7:  2    35        35            35
8:  2    38        38            38

I tried doing this with ifelse, but it doesn't work recursively so it only applies to the following row, and not any subsequent rows. Also the lag function results in me losing the first value in value.

I've seen examples with character variables with make.names or make.unique, but haven't been able to find a solution for a duplicated numeric value.

Background: I am doing a survival analysis and I am finding that with my data there are stop times that are the same, so I need to make it unique by adding a 1 (stop times are in seconds).

like image 494
Daren Eiri Avatar asked Apr 04 '17 01:04

Daren Eiri


People also ask

Is duplicate data allowed in Set 1?

Sets cannot contain duplicates. Duplicates are discarded when initializing a set. If adding an element to a set, and that element is already contained in the set, then the set will not change.

How does R handle duplicate data?

Remove duplicate rows in a data frameThe function distinct() [dplyr package] can be used to keep only unique/distinct rows from a data frame. If there are duplicate rows, only the first row is preserved. It's an efficient version of the R base function unique() .

How do I remove duplicate values in R?

R base provides duplicated() and unique() functions to remove duplicates in an R DataFrame (data. frame), By using these two functions we can delete duplicate rows by considering all columns, single column, or selected columns.

How do I find duplicate rows in R?

We can find the rows with duplicated values in a particular column of an R data frame by using duplicated function inside the subset function. This will return only the duplicate rows based on the column we choose that means the first unique value will not be in the output.


2 Answers

Here's an attempt. You're essentially grouping by id and value and adding 0:(length(value)-1). So:

data[, onemore := value + (0:(.N-1)), by=.(id, value)]

#   id value new_value desired_value onemore
#1:  1    95        96            95      95
#2:  1   100       101           100     100
#3:  1   101       102           101     101
#4:  1   101       102           102     102
#5:  1   101       102           103     103
#6:  2    20        21            20      20
#7:  2    35        36            35      35
#8:  2    38        39            38      38
like image 118
thelatemail Avatar answered Sep 18 '22 23:09

thelatemail


With base R we can use ave where we take the first value of each group and basically add the row number of that row in that group.

data$value1 <- ave(data$value, data$id, data$value, FUN = function(x)
                                                      x[1] + seq_along(x) - 1)

#   id value new_value desired_value value1
#1:  1    95        96            95     95
#2:  1   100       101           100    100
#3:  1   101       102           101    101
#4:  1   101       102           102    102
#5:  1   101       102           103    103
#6:  2    20        21            20     20
#7:  2    35        36            35     35
#8:  2    38        39            38     38
like image 27
Ronak Shah Avatar answered Sep 21 '22 23:09

Ronak Shah