I have a binary vector which represents a time series. I'd like to filter out fast switches like 00000001100000000 should be zeros and likewise 11111111111011111 should be just ones.
What kind of filter/function would be appropriate for that task?
Maybe this is a stupid approach but rle
/inverse.rle
seem to be good candidates. E.g. if you define a fast switch as a period of less than 3 equal values:
b1 <- c(rep(0, 7), rep(1, 2), rep(0, 7))
b2 <- c(rep(1, 10), 0, rep(1, 4))
binaryFilter <- function(x, threshold=3) {
r <- rle(x)
isBelowThreshold <- r$lengths < threshold
r$values[isBelowThreshold] <- abs(1-r$values[isBelowThreshold])
return(inverse.rle(r))
}
binaryFilter(b1)
# [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
binaryFilter(b2)
# [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
How about taking the neighbouring values into account using a weighted average? In this case the 2 neighbours of every value (which has 2 neighbours on both sides) are considered. Of course this can be adjusted.
> v <- sample(c(0,1),30,replace=TRUE)
> v
[1] 0 1 1 1 0 0 0 0 1 1 0 1 0 0 1 0 1 1 0 0 0 0 1 1 1 0 1 0 0 0
# embed(v,5) is a short version for this:
# cbind(v[1:26],v[2:27],v[3:28],v[4:29],v[5:30])
> m <- embed(v,5)
> c(round(m %*% c(.1,.2,.4,.2,.1)))
[1] 1 1 0 0 0 0 1 1 0 0 0 0 0 0 1 1 0 0 0 0 1 1 1 0 0 0
before: 0 1 1 1 0 0 0 0 1 1 0 1 0 0 1 0 1 1 0 0 0 0 1 1 1 0 1 0 0 0
after: . . 1 1 0 0 0 0 1 1 0 0 0 0 0 0 1 1 0 0 0 0 1 1 1 0 0 0 . .
as you can see, the loners are gone.
As suggested by sgibb, the whole fuzz can be boiled down to:
round(filter(v, c(.1,.2,.4,.2,.1)))
(But I guess the above written out version makes it clear what is done, which is why I leave it)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With