Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove consecutive duplicate values per row in R

Tags:

r

duplicates

I am working on a dataset in R where I want to remove consecutive duplicate values PER ROW. For example row (19,15,19,19) should become row (19,15,19).

I tried to use duplicated(df) but this removes ALL duplicates resulting in (19,15) instead of only the consecutive duplicates.

Reproducable example:

a <- c(19,18,19,9,9,19,19)
b <- c(15,0,19,9,19,19,13)
c <- c(19,0,13,19,19,19,0)
d <- c(19,0,0,19,19,0,0)

trajectories <- cbind(a,b,c,d)
like image 484
olive Avatar asked Apr 24 '17 08:04

olive


1 Answers

We can loop through the row and get the unique elements based on run-length-encoding to create a list of vectors

lst <- apply(trajectories, 1, FUN = function(x) rle(x)$values)
lst
#[[1]]
# a  b  d 
#19 15 19 

#[[2]]
# a  d 
#18  0 

#[[3]]
# b  c  d 
#19 13  0 

#[[4]]
# b  d 
# 9 19 

#[[5]]
# a  d 
# 9 19 

#[[6]]
# c  d 
#19  0 

#[[7]]
# a  b  d 
#19 13  0 

We can append NA as the end to make the number of elements the same

do.call(rbind, lapply(lst, `length<-`, max(lengths(lst))))

Update

As @Sotos mentioned, if we need the column names as intact as in the original, then

do.call(rbind, lapply(lst, function(x) {
            x[setdiff(colnames(trajectories), names(x))] <- NA
            x[colnames(trajectories)]}))

Or another option is to get the difference between adjacent elements in each row, create a logical vector based on the difference being not zero to subset the elements

apply(trajectories, 1, FUN = function(x) x[c(TRUE, diff(x)!=0)])

Or another option which works on the example

 i1 <- which(cbind(1, trajectories[,-1] -
         trajectories[,-ncol(trajectories)])!=0, arr.ind=TRUE)
 lapply(split(1:nrow(i1), i1[,1]), function(i) trajectories[i1[i,, drop = FALSE]])
like image 161
akrun Avatar answered Sep 28 '22 04:09

akrun