I have a table which I need to populate with mean. I am currently using inefficient code that will take a long time on large data sets. Example:
Sample Data:
x = read.table(text="a b value mean
1 1 10 0
1 1 12 0
2 2 14 0
2 1 16 0", header=TRUE)
Code:
y <- aggregate(x$value, list(a = x$a,b = x$b), mean)
print(y)
# a b x
# 1 1 1 11
# 2 2 1 16
# 3 2 2 14
for (i in 1:4) {
for (j in 1:3) {
if (x$a[i]==y$a[j] && x$b[i]==y$b[j]) {
x$mean[i]=y$x[j] }
}
}
print(x) # This is the final output
# a b value mean
# 1 1 1 10 11
# 2 1 1 12 11
# 3 2 2 14 14
# 4 2 1 16 16
I want to be able to get from the input to the output with efficient code. I am new to R so many thanks for helping out!
data.table
is the way to go:
library(data.table)
x.dt <- data.table(x[1:3]) # convert first three cols
x.dt[, mean:=mean(value), by=list(a, b)] # add back mean
# a b value mean
# 1: 1 1 10 11
# 2: 1 1 12 11
# 3: 2 2 14 14
# 4: 2 1 16 16
data.table
is very fast.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With