I am working on a dataset in R where I want to remove consecutive duplicate values PER ROW. For example row (19,15,19,19) should become row (19,15,19).
I tried to use duplicated(df) but this removes ALL duplicates resulting in (19,15) instead of only the consecutive duplicates.
Reproducable example:
a <- c(19,18,19,9,9,19,19)
b <- c(15,0,19,9,19,19,13)
c <- c(19,0,13,19,19,19,0)
d <- c(19,0,0,19,19,0,0)
trajectories <- cbind(a,b,c,d)
We can loop through the row and get the unique elements based on run-length-encoding to create a list
of vector
s
lst <- apply(trajectories, 1, FUN = function(x) rle(x)$values)
lst
#[[1]]
# a b d
#19 15 19
#[[2]]
# a d
#18 0
#[[3]]
# b c d
#19 13 0
#[[4]]
# b d
# 9 19
#[[5]]
# a d
# 9 19
#[[6]]
# c d
#19 0
#[[7]]
# a b d
#19 13 0
We can append NA
as the end to make the number of elements the same
do.call(rbind, lapply(lst, `length<-`, max(lengths(lst))))
As @Sotos mentioned, if we need the column names as intact as in the original, then
do.call(rbind, lapply(lst, function(x) {
x[setdiff(colnames(trajectories), names(x))] <- NA
x[colnames(trajectories)]}))
Or another option is to get the diff
erence between adjacent elements in each row, create a logical vector based on the difference being not zero to subset the elements
apply(trajectories, 1, FUN = function(x) x[c(TRUE, diff(x)!=0)])
Or another option which works on the example
i1 <- which(cbind(1, trajectories[,-1] -
trajectories[,-ncol(trajectories)])!=0, arr.ind=TRUE)
lapply(split(1:nrow(i1), i1[,1]), function(i) trajectories[i1[i,, drop = FALSE]])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With