I've a number of columns in an xts object, and I want to find the percentage in the first column above a certain number, the percentage in either first or second column above a certain number, the percentage in any of the first three columns above a certain number, etc.
I'm currently doing it manually, as follows:
library(xts)
set.seed(69)
x = xts( cbind( v.1 = runif(20)*100, v.2 = runif(20)*100, v.3 = runif(20)*100, v.4 = runif(20)*100), Sys.Date()-20:1 )
c(
mean( x$v.1 > 50),
mean( x$v.1 > 50 | x$v.2 > 50) ,
mean( x$v.1 > 50 | x$v.2 > 50 | x$v.3 > 50) ,
mean( x$v.1 > 50 | x$v.2 > 50 | x$v.3 > 50 | x$v.4 > 50)
)
Which gives this example output:
[1] 0.50 0.70 0.80 0.95
But now I want to generalize to any number of columns, not just v.1
to v.4
. So I'm looking for a single function something like this:
this_is_mean( x, c('v.1','v.2','v.3','v.4'), 50)
or maybe it would look like:
mean ( foo( x, c('v.1','v.2','v.3','v.4'), 50) )
(I'll be using paste('v',1:N,sep='.')
for the column names, of course)
Here's an another alternative that seems to be a bit faster than @AnadaMahto's solution (on this example). You might also find it a bit more straightforward.
R> rowMeans(apply(x > 50, 1, cumsum) >= 1)
v.1 v.2 v.3 v.4
0.50 0.70 0.80 0.95
Though do note that rowMeans
only makes one pass over the data. Unlike mean
, which makes 2 passes (one is for floating point arithmetic error-correction).
It seems like you should be able to use sapply
and rowSums
(if I understand your problem correctly):
sapply(1:ncol(x), function(y) mean(rowSums(x[, seq(y)] > 50) >= 1))
## [1] 0.50 0.70 0.80 0.95
Use vapply(1:ncol(x), function(y) mean(rowSums(x[, seq(y)] > 50) >= 1), numeric(1L))
to eek out a little bit more speed, if required.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With