I have a data frame with 3 groups and 3 days:
set.seed(10)
dat <- data.frame(group=rep(c("g1","g2","g3"),each=3), day=rep(c(0,2,4),3), value=runif(9))
# group day value
# 1 g1 0 0.507478
# 2 g1 2 0.306769
# 3 g1 4 0.426908
# 4 g2 0 0.693102
# 5 g2 2 0.085136
# 6 g2 4 0.225437
# 7 g3 0 0.274531
# 8 g3 2 0.272305
# 9 g3 4 0.615829
I want to take the log2 and divide each value with the day 0 value within each group. The way I'm doing it now is by calculating each day group in an intermediate step:
day_0 <- dat[dat$day==0, "value"]
day_2 <- dat[dat$day==2, "value"]
day_4 <- dat[dat$day==4, "value"]
res <- cbind(0, log2(day_2/day_0), log2(day_4/day_0))
rownames(res) <- c("g1","g2","g3")
colnames(res) <- c("day_0","log_ratio_day_2_day_0","log_ratio_day_4_day_0")
# day_0 log_ratio_day_2_day_0 log_ratio_day_4_day_0
# g1 0 -0.7261955 -0.249422
# g2 0 -3.0252272 -1.620346
# g3 0 -0.0117427 1.165564
What's the proper way of calculating res
without an intermediate step?
A data.table
solution for coding elegance and memory efficiency
library(data.table)
DT <- data.table(dat)
# assign within DT by reference
DT[, new_value := log2(value / value[day == 0]), by = group]
Or you could use joins
and keys
and by-without-by
DTb <- data.table(dat)
setkey(DTb, group)
# val0 contains just those records for day 0
val0 <- DTb[day==0]
# the i.value refers to value from the i argument
# which is in this case `val0` and thus the value for
# day = 0
DTb[val0, value := log2(value / i.value)]
Both these solution do not require you to sort by day
to ensure that value
will the first (or any particular) element.
Docuementation for i.
syntax
********************************************** ** ** ** CHANGES IN DATA.TABLE VERSION 1.7.10 ** ** ** ********************************************** NEW FEATURES o New function setcolorder() reorders the columns by name or by number, by reference with no copy. This is (almost) infinitely faster than DT[,neworder,with=FALSE]. o The prefix i. can now be used in j to refer to join inherited columns of i that are otherwise masked by columns in x with the same name.
Your friend is ddply
from the plyr
package:
require(plyr)
> ddply(dat, .(group), mutate, new_value = log2(value / value[1]))
group day value new_value
1 g1 0 0.50747820 0.00000000
2 g1 2 0.30676851 -0.72619548
3 g1 4 0.42690767 -0.24942179
4 g2 0 0.69310208 0.00000000
5 g2 2 0.08513597 -3.02522716
6 g2 4 0.22543662 -1.62034599
7 g3 0 0.27453052 0.00000000
8 g3 2 0.27230507 -0.01174274
9 g3 4 0.61582931 1.16556397
Base solution:
> res <- do.call(rbind,by(dat,dat$group,function(x) log2(x$value/x$value[x$day==0])))
> res
[,1] [,2] [,3]
g1 0 -1.6496538 -2.3673937
g2 0 0.3549090 0.4537402
g3 0 -0.9423506 1.4603706
> colnames(res) <- c("day_0","log_ratio_day_2_day_0","log_ratio_day_4_day_0")
> res
day_0 log_ratio_day_2_day_0 log_ratio_day_4_day_0
g1 0 -1.6496538 -2.3673937
g2 0 0.3549090 0.4537402
g3 0 -0.9423506 1.4603706
This uses ave
in the core of R:
transform(dat, value0 = ave(value, group, FUN = function(x) log2(x / x[1])))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With