Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Applying function over certain values in vector (R)

Tags:

r

vector

I would like to know if there is some elegant solution to this problem:

Let's say I have a vector of values

a <- c(1,2,3,3.1,3.2,5,6,7,7.1,7.2,9)

and I want to apply some function (e.g. mean) only to values fulfilling certain condition, which in this case is to have the difference between values smaller than 0.5 .

So the values that should be averaged are (3,3.1,3.2) and (7,7.1,7.2) and the function should return vector

b <- c(1,2,3.1,5,6,7.1,9)

Edit: One approach I've tried (not sure if right) is to binarize the vector a (1 meaning the difference between values is <0.5; 0 meaning the diff is >0.5), so I got vector

bin <– c(0,0,1,1,0,0,0,1,1,0)

but I don't know how to apply mean to the separate groups of ones. So the main problem for me is to distinguish the groups of needed values and apply the mean to them separately. Any ideas?

I am new here so if anything is unclear, please let me know. Thank you in advance.

like image 683
qeeZz Avatar asked Apr 05 '14 10:04

qeeZz


1 Answers

This doesn't qualify as elegant, but I think that it works in the case you provide. I use rle (base R) to identify runs where diffs are less than 0.5.

a <- c(1, 2, 3, 3.1, 3.2, 5, 6, 7, 7.1, 7.2, 9) 
crit <- diff(a) < 0.5
crit <- c(head(crit, 1), crit) | c(crit, tail(crit, 1))
run <- rle(crit)
aa <- split(a, rep(seq(length(run$lengths)), times=run$lengths))
myFun <- function(crit, val) {
    if (crit) {
        mean(val)
    }
    else {
        val
    }
}
unlist(mapply(FUN=myFun, crit=run$values, val=aa, USE.NAMES=FALSE))

Yields:

> unlist(mapply(FUN=myFun, crit=run$values, val=aa, USE.NAMES=FALSE))
[1] 1.0 2.0 3.1 5.0 6.0 7.1 9.0

Maybe someone can build a cleaner solution from this.


Update: OP points out that this fails on a sequence like {3, 3.1, 3.2, 7, 7.1, 7.2} because the code above lumps this into one run and averages across the whole sequence. Here's a more robust solution.

a <- c(1, 2, 3, 3.1, 3.2, 7, 7.1, 7.2, 10)

run <- unclass(rle(diff(a) < 0.5))
len <- run$lengths
val <- run$values
pos <- seq_along(len)
last <- pos == max(pos)
len <- len + val - c(0, head(val, -1)) + (last * !val)
prevLen <- c(0, head(cumsum(len), -1))
myFun <- function(l, v, pl, x) {
    if (l == 0) {
        NULL
    } else {
        seg <- seq(l) + pl
        if (v == TRUE) {
            mean(x[seg])
        } else {
            x[seg]
        }
    }
}
unlist(mapply(FUN=myFun, l=len, v=val, pl=prevLen, MoreArgs=list(x=a)))

Now whenever it comes across a small difference run (i.e., val == TRUE) it adds more one to the length of that small difference run (i.e., len + val), but that additional element comes from the next run, but it can't steal from the last run if it's not a small difference run (i.e., last * !val).

like image 134
Richard Herron Avatar answered Nov 11 '22 13:11

Richard Herron