Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Normalizing the values in a data table using the values stored in another data table

Tags:

r

data.table

I am trying to normalize the values in a data table (dt) using the baseline values stored in another data table (dt.base). Next you have a sample contents of these tables and the code to generate that example:

> dt
   Bench Config Part Power
1:     A     10    P   171
2:     A     10    Q   125
3:     A    100    P   139
4:     A    100    Q   109
5:     B     10    P   196
6:     B     10    Q   101
7:     B    100    P   157
8:     B    100    Q   176
> dt.base
   Bench Config Part Power
1:     A   Base    P   187
2:     A   Base    Q   104
3:     B   Base    P   166
4:     B   Base    Q   188

Example generation code:

set.seed(13)

dt <- data.table(
  Bench = c(rep('A', 4), rep('B', 4)),
  Config = rep(c(10, 10, 100, 100), 2),
  Part = rep(c('P', 'Q'), 4),
  Power = round(runif(8, 100, 200)))

dt.base <- data.table(
  Bench = c(rep('A', 2), rep('B', 2)),
  Config = c('Base', 'Base', 'Base', 'Base'),
  Part = rep(c('P', 'Q'), 2),
  Power = round(runif(4, 100, 200)))

The idea would be to divide all the values in dt by their corresponding values in dt.base. Therefore, the table would become:

   Bench Config Part Power
1:     A     10    P   171 / 187
2:     A     10    Q   125 / 104
3:     A    100    P   139 / 187
4:     A    100    Q   109 / 104
5:     B     10    P   196 / 166
6:     B     10    Q   101 / 188
7:     B    100    P   157 / 166
8:     B    100    Q   176 / 188

I thought the solution for this was quite straightforward, but I am running into some issues. This is my current attempt:

normalize.power <- function(pwr, base.pwr) {
  pwr / base.pwr
}

dt.norm <- dt[,
    Power <- normalize.power(
      .SD, dt.base[Bench == Bench & Config == 'Base' & Part == Part,
                   'Power', with = F]
    ), by = list(Bench, Config, Part)]

The problem is that normalize.pwr is not receiving a single value in its second parameter (base.pwr), but rather a vector containing all the power values in dt.base. However, when I directly execute from the command line

dt.base[Bench == 'A' & Config == 'Base' & Part == 'P', 'Power', with = F]

then I obtain a single power value, as expected.

I would appreciate any help that solves my problem or leads me to the solution.

like image 457
betabandido Avatar asked Jan 24 '14 21:01

betabandido


People also ask

What is data normalization in access?

If your Microsoft Access database has a table that contains repeating information in one or more fields, use the Table Analyzer to split the data into related tables so that you can store data more safely and efficiently. This process is called normalization.

Which normalization removes multiple values in a column?

First Normal Form (1NF) To normalize a relation that contains a repeating group, remove the repeating group and form two new relations.


1 Answers

You can try something like this

setkey(dt, Bench, Part)
setkey(dt.base, Bench, Part)

dt[dt.base, Power := Power / i.Power]
dt
##    Bench Config Part   Power
## 1:     A     10    P 0.91444
## 2:     A    100    P 0.74332
## 3:     A     10    Q 1.20192
## 4:     A    100    Q 1.04808
## 5:     B     10    P 1.18072
## 6:     B    100    P 0.94578
## 7:     B     10    Q 0.53723
## 8:     B    100    Q 0.93617

Thanks @Arun for the useful i.Power syntax

like image 135
dickoa Avatar answered Oct 05 '22 16:10

dickoa