Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dividing columns by colSums in R

Tags:

r

I am trying to scale the values in a matrix so that each column adds up to one. I have tried:

m = matrix(c(1:9),nrow=3, ncol=3, byrow=T)      [,1] [,2] [,3] [1,]    1    2    3 [2,]    4    5    6 [3,]    7    8    9  colSums(m) 12 15 18  m = m/colSums(m)           [,1]      [,2] [,3] [1,] 0.08333333 0.1666667 0.25 [2,] 0.26666667 0.3333333 0.40 [3,] 0.38888889 0.4444444 0.50  colSums(m) [1] 0.7388889 0.9444444 1.1500000 

so obviously this doesn't work. I then tried this:

m = m/matrix(rep(colSums(m),3), nrow=3, ncol=3, byrow=T)           [,1]      [,2]      [,3] [1,] 0.08333333 0.1333333 0.1666667 [2,] 0.33333333 0.3333333 0.3333333 [3,] 0.58333333 0.5333333 0.5000000   m = colSums(m) [1] 1 1 1 

so this works, but it feels like I'm missing something here. This can't be how it is routinely done. I'm certain I am being stupid here. Any help you can give would be appreciated Cheers, Davy

like image 612
Davy Kavanagh Avatar asked Feb 25 '12 20:02

Davy Kavanagh


People also ask

How do I divide a column in R?

Data Visualization using R Programming To divide each column by a particular column, we can use division sign (/). For example, if we have a data frame called df that contains three columns say x, y, and z then we can divide all the columns by column z using the command df/df[,3].

What does colSums do in R?

colSums() function in R Language is used to compute the sums of matrix or array columns. dims: this is integer value whose dimensions are regarded as 'columns' to sum over.


1 Answers

See ?sweep, eg:

> sweep(m,2,colSums(m),`/`)            [,1]      [,2]      [,3] [1,] 0.08333333 0.1333333 0.1666667 [2,] 0.33333333 0.3333333 0.3333333 [3,] 0.58333333 0.5333333 0.5000000 

or you can transpose the matrix and then colSums(m) gets recycled correctly. Don't forget to transpose afterwards again, like this :

> t(t(m)/colSums(m))            [,1]      [,2]      [,3] [1,] 0.08333333 0.1333333 0.1666667 [2,] 0.33333333 0.3333333 0.3333333 [3,] 0.58333333 0.5333333 0.5000000 

Or you use the function prop.table() to do basically the same:

> prop.table(m,2)            [,1]      [,2]      [,3] [1,] 0.08333333 0.1333333 0.1666667 [2,] 0.33333333 0.3333333 0.3333333 [3,] 0.58333333 0.5333333 0.5000000 

The time differences are rather small. the sweep() function and the t() trick are the most flexible solutions, prop.table() is only for this particular case

like image 119
Joris Meys Avatar answered Sep 22 '22 17:09

Joris Meys