Applying a function row-wise for a dataset

Question

hope am able to explain clearly what I would like to do.

I have a matrix

  Z<-matrix(sample(1:40),ncol=4)

 colnames(Z)<-c("value","A","B","C")

 I would like to apply the following formula to each row in the dataset.


  Process = value - rowmean (A,B,C)
           ------------------------------------
           row-wise Standard deviation (A,B,C)

I thought of something like calculating everything separately like

Subsettting the data first

   onlyABC<-Z[,1:3]

Then apply the rowMeans to each row

     means<-apply(onlyABC,1,rowMeans)

And similarly compute standard deviation separately using

    deviate<-apply(onlyABC,1,SD)

And then I do not know now how to subtract the value column in matrix 'z' from 'means' and then divide by 'deviate'.

Is there a simpler approach to do this?

As an example applying the formula to the first row will give:

 row1  32-(19+35+4/3)
       --------------
        SD(19+35+4)

Similarly apply the formula to other rows as well and get a vector of size 10 in the end.

Metrics · Accepted Answer

ksd<-apply(Z[,-1],1,sd)
kmean<-rowMeans(Z[,-1])
 Z[,1]<-(Z[,1]-kmean)/ksd
> Z
            value  A  B  C
 [1,]  0.88181533 26  4 31
 [2,] -0.04364358 17 22  7
 [3,]  2.21200505 25 13 18
 [4,]  0.50951017  8 34 40
 [5,]  0.03866223 12  6 23
 [6,] -0.64018440 29 16 30
 [7,] -0.40927275 39 35  9
 [8,] -0.65103077 24  5  1
 [9,]  0.89658092 37 27  3
[10,]  0.26360896 11 10 28

ricardo · Answer

This isn't quite an apply problem, as you want to exclude the first column of each row from the calculation.

The iterative way of doing this is to first create the output vector, and then substitute into it as follows:

tranZ <- vector('numeric', length = nrow(Z))
for (i in 1:nrow(Z)) {
    tranZ[i] <- (Z[i,1] - mean(Z[i,-1])) / sd(Z[i,-1])
}

If you have a large data-set, i suggest using the power of vectorisation -- try the following:

(Z[,1] - rowMeans(Z[,-1])) / apply(Z[, -1], 1, sd)

Or with vapply:

tranZ_v <- vapply(1:nrow(Z), function(X) (Z[X, 1] - mean(Z[X, -1])) / sd(Z[X, -1]),
                FUN.VALUE = numeric(1))

The key to using the *apply family in this case is controlling the application -- to do this i've iterated across 1:nrow(Z) rather than the object itself: calling the object in the function.

Benchmarking

require(rbenchmark)

process <- function(x) {
    (x[["value"]] - mean(c(x[["A"]], x[["B"]], x[["C"]]))) / sd(c(x[["A"]], x[["B"]], x[["C"]]))
}          

p2 <- function(x) {
    (x[1] - mean(x[-1])) / sd(x[-1])
}

apply_fun <- function() apply(Z, 1, process)
apply_fun2 <- function() apply(Z, 1, p2)

apply_sd <- function() (Z[,1] - rowMeans(Z[,-1])) / apply(Z[, -1], 1, sd)

vapply_anon <- function() vapply(1:nrow(Z), FUN = function(X) (Z[X, 1] - mean(Z[X, -1])) / sd(Z[X, -1]),
                FUN.VALUE = numeric(1))


bb <- benchmark(apply_fun(), apply_fun2(), apply_sd(), vapply_anon(), 
          columns = c('test', 'elapsed', 'relative'), 
          replications = 100, 
          order = 'elapsed')

The vectorised approach, using apply for only the sd is fastest:

> bb
           test elapsed relative
3    apply_sd()   0.021    1.000
4 vapply_anon()   0.030    1.429
1   apply_fun()   0.033    1.571
2  apply_fun2()   0.034    1.619

Applying a function row-wise for a dataset

Tags:

r

Paul

2 Answers

Metrics

ricardo

Recent Activity

Donate For Us

Applying a function row-wise for a dataset

Tags:

r

Paul

2 Answers

Metrics

ricardo

Related questions

Recent Activity

Donate For Us