Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dividing rows by their sum in R

I have the following example set of data:

Example<-data.frame(A=10*1:9,B=10*10:18)

rownames(Example)<-paste("Sample",1:9)
> Example
          A   B
Sample 1 10 100
Sample 2 20 110
Sample 3 30 120
Sample 4 40 130
Sample 5 50 140
Sample 6 60 150
Sample 7 70 160
Sample 8 80 170
Sample 9 90 180

I am trying to divide each element in both columns by its column's total. I have tried a variety of methods, but I feel like I am missing a fundamental piece of code that would make this easier. I have gotten this far:

ExampleSum1 <- sum(Example[,1])
ExampleSum2 <- sum(Example[,2])

But I don't know how to divide 10, 20, 30, etc by ExampleSum1, etc.

like image 257
Adam Avatar asked Sep 09 '25 22:09

Adam


2 Answers

data.table solution:

sum.cols = c("A", "B")
library(data.table)
setDT(Example, keep.rownames = TRUE)
Example[ , (sum.cols) := lapply(.SD, function(x) x/sum(x)), .SDcols = sum.cols]

Or perhaps more direct in your case:

Example[ , c("A", "B") := .(A/sum(A), B/sum(B))]

Which give:

Example
#          rn          A          B
# 1: Sample 1 0.02222222 0.07936508
# 2: Sample 2 0.04444444 0.08730159
# 3: Sample 3 0.06666667 0.09523810
# 4: Sample 4 0.08888889 0.10317460
# 5: Sample 5 0.11111111 0.11111111
# 6: Sample 6 0.13333333 0.11904762
# 7: Sample 7 0.15555556 0.12698413
# 8: Sample 8 0.17777778 0.13492063
# 9: Sample 9 0.20000000 0.14285714

The main appeal of this approach as opposed to one using colSums or sweep is that both of these require converting your data to a matrix and then back, which may be costly. It depends on your use case; if your table is small, these other approaches are fine and it depends on what you find most readable.

I also notice that no other answers mention the mapply approach, which would work in almost any paradigm; here's the data.table approach:

Example[ , (sum.cols) := mapply(`/`, .SD, lapply(.SD, sum), SIMPLIFY = FALSE), 
        .SDcols = sum.cols]
like image 133
MichaelChirico Avatar answered Sep 12 '25 13:09

MichaelChirico


You can get column sums with colSums and paste to make new column names derived from the previous. colSums returns a vector of the column sums, but to do column-wise division you need to use a little trickery. The best way looks to be the one mentioned @user20650.

## Make new columns: proportions of column sums
dat[,paste(names(dat),"prop", sep="_")] <- t( t(dat) / colSums(dat) )

dat
#          A   B     A_prop     B_prop
# Sample1 10 100 0.02222222 0.07936508
# Sample2 20 110 0.04444444 0.08730159
# Sample3 30 120 0.06666667 0.09523810
# Sample4 40 130 0.08888889 0.10317460
# Sample5 50 140 0.11111111 0.11111111
# Sample6 60 150 0.13333333 0.11904762
# Sample7 70 160 0.15555556 0.12698413
# Sample8 80 170 0.17777778 0.13492063
# Sample9 90 180 0.20000000 0.14285714

Data

dat <- read.table(text="A      B
Sample1    10     100
Sample2    20     110
Sample3    30     120
Sample4    40     130
Sample5    50     140
Sample6    60     150
Sample7    70     160
Sample8    80     170
Sample9    90     180", header=T)
like image 35
Rorschach Avatar answered Sep 12 '25 13:09

Rorschach