Normalize each row of data.table

Question

This seems like it should be easy, but I can't find an answer :(. I'm trying to normalize each row of a data_table like this:

normalize <- function(x) {
  s = sum(x)
  if (s>0) {
    return(x/s)
  } else {
    return 0
  }
}

How do I call this function on every row of a data.table and get a normalized data.table back? I can do a for loop, but that's surely not the right way, and apply(data, 1, normalize) will, as I understand, convert my data.table to a matrix which will be a big performance hit.

David Arenburg · Accepted Answer

Considering this example data set (next time, please provide an example data set yourself)

set.seed(123) 
DT <- data.table(x = rnorm(10), y = rnorm(10), z = rnorm(10))

I would try avoiding by row operations and vecotrize using rowSums, something like the following

DT[, names(DT) := {temp = rowSums(.SD) ; (.SD / temp) * (temp > 0)}]
DT
#              x          y          z
#  1:  0.0000000  0.0000000  0.0000000
#  2:  0.0000000  0.0000000  0.0000000
#  3:  1.6697906  0.4293327 -1.0991233
#  4:  0.0000000  0.0000000  0.0000000
#  5:  0.0000000  0.0000000  0.0000000
#  6:  0.9447911  0.9843707 -0.9291618
#  7:  0.2565558  0.2771142  0.4663301
#  8:  0.0000000  0.0000000  0.0000000
#  9:  0.0000000  0.0000000  0.0000000
# 10: -1.3289000 -1.4097961  3.7386962

The reason I've created temp is in order to avoid running rowSums(.SD) twice. The *(temp > 0) part is basically your if and else statement. It returns a logical vector of TRUE/FALSE which then converted to 1/0 and then multiplied against (.SD/temp)

Frank · Answer

Here's one way to avoid coercing to a matrix:

cols = names(DT)
DT[, s := Reduce("+",.SD)]
DT[s > 0, (cols) := lapply(.SD,"/",s), .SDcols = cols]
DT[s <=  0, (cols) := 0]
DT[, s := NULL]

This is what I would do if there was a good reason to use a data.table over a matrix (in a later step).

Normalize each row of data.table

Tags:

performance

r

data.table

normalization

Stan

2 Answers

David Arenburg

Frank

Recent Activity

Donate For Us

Normalize each row of data.table

Tags:

performance

r

data.table

normalization

Stan

2 Answers

David Arenburg

Frank

Related questions

Recent Activity

Donate For Us