Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Divide data.table rows by overall mean

Tags:

r

data.table

Consider the following

mtcars.dt <- data.table(mtcars)
DT1 = mtcars.dt[, lapply(.SD, mean), by=cyl]
DT2 = mtcars.dt[, lapply(.SD, mean)]

Now, we have the following values:

> DT1
   cyl      mpg     disp        hp     drat       wt     qsec        vs        am     gear     carb
1:   6 19.74286 183.3143 122.28571 3.585714 3.117143 17.97714 0.5714286 0.4285714 3.857143 3.428571
2:   4 26.66364 105.1364  82.63636 4.070909 2.285727 19.13727 0.9090909 0.7272727 4.090909 1.545455
3:   8 15.10000 353.1000 209.21429 3.229286 3.999214 16.77214 0.0000000 0.1428571 3.285714 3.500000

and

> DT2
        mpg    cyl     disp       hp     drat      wt     qsec     vs      am   gear   carb
1: 20.09062 6.1875 230.7219 146.6875 3.596563 3.21725 17.84875 0.4375 0.40625 3.6875 2.8125

Now, I want to have the mpg, disp, ..., of each row in DT1 normalized by the average of the whole original table (available in DT2).

How would I do this? What is the correct idiom here?

Edit: Here is the desired output, sorry that I was not clearer before.

    cyl      mpg      disp        hp      drat        wt      qsec       vs        am      gear      carb
1:    6 0.9826900 0.7945249 0.8336478 0.9969837 0.9688843 1.0071934 1.306122 1.0549451 1.0460048 1.2190476
2:    4 1.3271681 0.4556844 0.5633497 1.1318889 0.7104599 1.0721912 2.077922 1.7902098 1.1093991 0.5494949
3:    8 0.7515943 1.5304141 1.4262584 0.8978812 1.2430536 0.9396817 0.000000 0.3516484 0.8910412 1.2444444
like image 293
Manuel Avatar asked Mar 25 '26 16:03

Manuel


1 Answers

Here's a possible more data.tableish solution which uses the efficient set function (I'm using the newest data.table version on CRAN btw- v 1.9.6)

Create DT1

library(data.table) # V 1.9.6+
mtcars.dt <- data.table(mtcars)
DT1 <- mtcars.dt[, lapply(.SD, mean), by = cyl]

Now Create DT2 while avoiding the cyl column by negating it in the .SDcols argument

DT2 <- unlist(mtcars.dt[, lapply(.SD, mean), .SDcols = -"cyl"])

Now loop over second column in DT1 and on and update DT1 in place while dividing by the elements in DT2

for (j in 2L:length(DT1)) set(DT1, j = j, value = DT1[[j]]/DT2[j - 1L])
DT1
#    cyl       mpg      disp        hp      drat        wt      qsec       vs        am      gear      carb
# 1:   6 0.9826900 0.7945249 0.8336478 0.9969837 0.9688843 1.0071934 1.306122 1.0549451 1.0460048 1.2190476
# 2:   4 1.3271681 0.4556844 0.5633497 1.1318889 0.7104599 1.0721912 2.077922 1.7902098 1.1093991 0.5494949
# 3:   8 0.7515943 1.5304141 1.4262584 0.8978812 1.2430536 0.9396817 0.000000 0.3516484 0.8910412 1.2444444
like image 75
David Arenburg Avatar answered Mar 27 '26 08:03

David Arenburg



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!