Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Data transformation avoiding nested loops in R

I have a contingency table data matrix with 6 columns and 37 rows. I need to apply a Chi squared transformation to give me Row profiles and Column profiles for a correspondence analysis.

Unfortunately I've been told I will need to use nested loops to transform the data and carry out the CA (rather than doing it the more sensible ways in R). I was given the structure to use for my nested loop:

transformed.data=data0

for (row.index in 1:nrow(data)) {
  for (col.index in 1:ncol(data)) {
    transfomed.data[row.index,col.index]=
       "TRANSFORMATION"[row.index,col.index]
  }
}

From what i understand by using the nested loop it will apply my "TRANSFORMATION" first to the rows and then to the columns.

The transformation I want done on the data to get the row profiles is:

( X( ij ) / sum( X( i ) ) ) / sqrt( sum( X( j ) ) )

While the transformation I want done on the data to get the column profiles is:

( X( ij ) / sum( X( j ) ) ) / sqrt( sum( X( i ) ) )

What would I enter as my "TRANSFORMATION" in the last line of the nested loop to get it to output my desired transformation for profiles. Otherwise if I've miss understood the point of a nested loop here please describe what it would allow me to do.

This is the code for a subset of my data:

matrix(c(15366,2079,411,366,23223,2667,699,819,31632,2724,717,1473,49938,3111,1062,11964)
,nrow=4,ncol=4,byrow=T)

So using this subset alone I would expect the row profile for the first row to be:

0.002432689 0.0003291397 6.506803e-05 5.794379e-05

And the column profile for the first column to be:

0.0009473414, 0.0132572344, 0.0572742202, 0.0132863528 
like image 369
Confused Avatar asked Sep 08 '12 02:09

Confused


1 Answers

You can use this in these types of calculations without needing even a single loop. Rewrite your equation, and then you get :

Xtrans[i,j] = X[i,j] / ( sum( X[i, ] ) * sqrt( sum( X[ ,j] ) ) )

To get a matrix representing the term - sum( X[i, ] ) * sqrt( sum( X[ ,j] ) ) - you use the function outer() or %o% like this:

rowSums(X) %o% sqrt(colSums(X))

Or, for the column transformation :

sqrt(rowSums(X)) %o% colSums(X)

The only thing you need to do, is divide your original matrix by this one, eg for the col transformation :

TEST <- matrix(
               c(15366,2079,411,366,23223,2667,699,819,
                 31632,2724,717,1473,49938,3111,1062,11964),
                 nrow=4,ncol=4,byrow=T)

> TEST / (sqrt(rowSums(TEST)) %o% colSums(TEST))
             [,1]        [,2]        [,3]         [,4]
[1,] 0.0009473414 0.001455559 0.001053892 0.0001854284
[2,] 0.0011674098 0.001522501 0.001461474 0.0003383284
[3,] 0.0013770523 0.001346668 0.001298230 0.0005269580
[4,] 0.0016167998 0.001143812 0.001430074 0.0031831055

In approximately the same way you can calculate the row transformation.

Doing the hand calculations, I can confirm that my solution is correct, provided I understood your index notation correctly (meaning that i stands for rows and j for columns). The numbers you expect are not the ones you say you expect. To show you :

> ( TEST[1,2] / sum(TEST[,2]) ) / sqrt(sum(TEST[1,]))
[1] 0.001455559

The chi-square normalization you talk about, can actually be found in the function decostand of the vegan package. Mind you that by default, the method adjusts by multiplying by the square root of the matrix total. This makes sense in a correspondence analysis.

If you don't want to use this correction, then you can get eg the column transformation also as follows :

> require(vegan)
> decostand(TEST,method="chi.square",MARGIN=2)/sqrt(sum(TEST))
             [,1]         [,2]        [,3]        [,4]
[1,] 0.0009473414 0.0011674098 0.001377052 0.001616800
[2,] 0.0014555588 0.0015225011 0.001346668 0.001143812
[3,] 0.0010538924 0.0014614736 0.001298230 0.001430074
[4,] 0.0001854284 0.0003383284 0.000526958 0.003183106
attr(,"decostand")
[1] "chi.square"
like image 184
Joris Meys Avatar answered Oct 14 '22 07:10

Joris Meys