Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Partial vector addition in R

Tags:

r

I have a vector that contains from 1 to 5 repeated values following by another such set that is usually, but not always, incremented by one. For example,

c(1,1,1,1,1, 2,2,2,2, 3,3, 4,4,4,4,4)

I would like to operate on this in such a way as to add an increment of 0.2 to each value only when it is repeated giving

c(1,1.2,1.4,1.6,1.8, 2,2.2,2.4,2.6, 3,3.2, 4,4.2,4.4,4.6,4.8)

I can do this very easily by using a for loop, but my initial vector is over 1 million entries long and that takes quite a long time. I have been trying to come up with a list-based way of doing it without luck. Any suggestions would be appreciated.

like image 878
user2821938 Avatar asked Sep 27 '13 04:09

user2821938


4 Answers

Here is an approach using rle and sequence to create the sequence 0,0.2,0.4,.... and this gets added to the original.

x <- c(1,1,1,1,1, 2,2,2,2, 3,3, 4,4,4,4,4)    
x + (sequence(rle(x)$lengths)-1)*0.2
like image 53
mnel Avatar answered Oct 08 '22 05:10

mnel


Another ave possibility:

ave(
  dat,
  c(0,cumsum(diff(dat)!=0)),
  FUN=function(x) x + seq(0,(length(x)-1)*0.2,0.2)
)
#[1] 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 3.0 3.2 4.0 4.2 4.4 4.6 4.8
like image 43
thelatemail Avatar answered Oct 08 '22 06:10

thelatemail


Here is one possibility (given the condition that there will never be more than set of each number and each number has at most 5 repetitions):

myvec <- c(1,1,1,1,1, 2,2,2,2, 3,3, 4,4,4,4,4)
myvec + seq(0, .8, .2)[ave(myvec, myvec, FUN = seq_along)]
# [1] 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 3.0 3.2 4.0 4.2 4.4 4.6 4.8

For better alternatives when dealing with repeated numbers in your vector, see @mnel's and @thelatemail's answers....

like image 3
A5C1D2H2I1M1N2O1R2T1 Avatar answered Oct 08 '22 06:10

A5C1D2H2I1M1N2O1R2T1


This will probably be very quick on very large chains as well.

Edit - c_prev populated using head, and not tail. Thanks @Ricardosaporta for pointing it out

library(data.table)

test <- data.table(
c1 = c(1,1,1,1,1, 2,2,2,2, 3,3, 4,4,4,4,4)
)

test[,c_prev := c(NA,head(c1,-1))]

test[, increment := 0.0]
test[c1 == c_prev , increment := 0.2]

test[, cumincrement := cumsum(increment), by = c1]

test[, revised_c := c1]
test[!is.na(cumincrement), revised_c := revised_c + cumincrement]

test
#    c1 c_prev increment cumincrement revised_c
# 1:  1     NA       0.0          0.0       1.0
# 2:  1      1       0.2          0.2       1.2
# 3:  1      1       0.2          0.4       1.4
# 4:  1      1       0.2          0.6       1.6
# 5:  1      1       0.2          0.8       1.8
# 6:  2      1       0.0          0.0       2.0
# 7:  2      2       0.2          0.2       2.2
# 8:  2      2       0.2          0.4       2.4
# 9:  2      2       0.2          0.6       2.6
#10:  3      2       0.0          0.0       3.0
#11:  3      3       0.2          0.2       3.2
#12:  4      3       0.0          0.0       4.0
#13:  4      4       0.2          0.2       4.2
#14:  4      4       0.2          0.4       4.4
#15:  4      4       0.2          0.6       4.6
#16:  4      4       0.2          0.8       4.8
like image 1
TheComeOnMan Avatar answered Oct 08 '22 06:10

TheComeOnMan