I have two datasets, one a detailed dataset of weight
's and another that's supposed to be a summary dataset. I am trying to create the summary dataset by joining the detail dataset and aggregating, but it isn't working as expected.
Here's a sample code.
mytesta <- data.table(cola = c("a","b"), groupa = c(1,2)) # summary
mytestb <- data.table(groupa = c(1,1,1,1,2,2,2), weighta = c(10,20,30,25,15,30,10)) #detail
And this is my desired output.
cola groupa weighta
1: a 1 85
2: b 2 55
What I tried to do is,
mytesta[mytestb, on = "groupa", weight_summary := sum(i.weighta), by = "groupa"]
The problem is that when by
is used, the columns of the inner data.table disappear (for instance, mytesta[mytestb, on = "groupa", .SD, by = "groupa"]
). Is there a way around this?
I would do
mytesta[, v := mytestb[.SD, on=.(groupa), sum(weighta), by=.EACHI]$V1 ]
In a X[Y]
join, we're looking up each row of Y
in X
.
So if the end goal is to create a new column in Y
computed per row, we'll need a join Y[, v := X[Y, ...]]
even though Y[X, v := ...]
might seem more intuitive at first.
Here is a solution where I first merge your two data.tables, then summarize.
tab = merge(mytesta, mytestb, by="groupa")
tab
# groupa cola weighta
# 1: 1 a 10
# 2: 1 a 20
# 3: 1 a 30
# 4: 1 a 25
# 5: 2 b 15
# 6: 2 b 30
# 7: 2 b 10
res = tab[, list(weighta=sum(weighta)), by=list(cola, groupa)]
res
# cola groupa weighta
# 1: a 1 85
# 2: b 2 55
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With