This seems like it should be easy, but I can't find an answer :(. I'm trying to normalize each row of a data_table like this:
normalize <- function(x) {
s = sum(x)
if (s>0) {
return(x/s)
} else {
return 0
}
}
How do I call this function on every row of a data.table and get a normalized data.table back? I can do a for loop, but that's surely not the right way, and apply(data, 1, normalize)
will, as I understand, convert my data.table to a matrix which will be a big performance hit.
Considering this example data set (next time, please provide an example data set yourself)
set.seed(123)
DT <- data.table(x = rnorm(10), y = rnorm(10), z = rnorm(10))
I would try avoiding by row operations and vecotrize using rowSums
, something like the following
DT[, names(DT) := {temp = rowSums(.SD) ; (.SD / temp) * (temp > 0)}]
DT
# x y z
# 1: 0.0000000 0.0000000 0.0000000
# 2: 0.0000000 0.0000000 0.0000000
# 3: 1.6697906 0.4293327 -1.0991233
# 4: 0.0000000 0.0000000 0.0000000
# 5: 0.0000000 0.0000000 0.0000000
# 6: 0.9447911 0.9843707 -0.9291618
# 7: 0.2565558 0.2771142 0.4663301
# 8: 0.0000000 0.0000000 0.0000000
# 9: 0.0000000 0.0000000 0.0000000
# 10: -1.3289000 -1.4097961 3.7386962
The reason I've created temp
is in order to avoid running rowSums(.SD)
twice. The *(temp > 0)
part is basically your if
and else
statement. It returns a logical vector of TRUE/FALSE
which then converted to 1/0
and then multiplied against (.SD/temp)
Here's one way to avoid coercing to a matrix:
cols = names(DT)
DT[, s := Reduce("+",.SD)]
DT[s > 0, (cols) := lapply(.SD,"/",s), .SDcols = cols]
DT[s <= 0, (cols) := 0]
DT[, s := NULL]
This is what I would do if there was a good reason to use a data.table over a matrix (in a later step).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With