I cannot find a duplicate as of now.
My problem is the following:
I have two data.tables
. One with two columns (featurea, count), another with three columns (featureb, featurec, count). I want to multiply (?), in order so that I have a new data.table
with all the possibilities. The trick is that these features do not match, so therefore merge
solutions might not do the trick.
MRE as follows:
# two columns
DT1 <- data.table(featurea =c("type1","type2"), count = c(2,3))
# featurea count
#1: type1 2
#2: type2 3
#three columns
DT2 <- data.table(origin =c("house","park","park"), color =c("red","blue","red"),count =c(2,1,2))
# origin color count
#1: house red 2
#2: park blue 1
#3: park red 2
My expected result, in this case, is a data.table
as follows:
> DT3
origin color featurea total
1: house red type1 4
2: house red type2 6
3: park blue type1 2
4: park blue type2 3
5: park red type1 4
6: park red type2 6
Please test on larger data, I'm not sure how optimized this is:
DT2[, .(featurea = DT1[["featurea"]],
count = count * DT1[["count"]]), by = .(origin, color)]
# origin color featurea count
#1: house red type1 4
#2: house red type2 6
#3: park blue type1 2
#4: park blue type2 3
#5: park red type1 4
#6: park red type2 6
It might be more efficient to switch it around if DT1
has less groups:
DT1[, c(DT2[, .(origin, color)],
.(count = count * DT2[["count"]])), by = featurea]
# featurea origin color count
#1: type1 house red 4
#2: type1 park blue 2
#3: type1 park red 4
#4: type2 house red 6
#5: type2 park blue 3
#6: type2 park red 6
This would be one way. First, I expanded rows in DT2
with expandRows()
in the splitstackshape
package. Each row is repeated twice since I specified count = 2, count.is.col = FALSE
. Then, I took care of the multiplication and created a new column called total
. At the same time, I created a new column for featurea
. Finally, I dropped count
.
library(data.table)
library(splitstackshape)
expandRows(DT2, count = nrow(DT1), count.is.col = FALSE)[,
`:=` (total = count * DT1[, count], featurea = DT1[, featurea])][, count := NULL]
EDIT
If you do not want to add another package, you can try David's idea in his comment.
DT2[rep(1:.N, nrow(DT1))][,
`:=`(total = count * DT1$count, featurea = DT1$featurea, count = NULL)][]
# origin color total featurea
#1: house red 4 type1
#2: house red 6 type2
#3: park blue 2 type1
#4: park blue 3 type2
#5: park red 4 type1
#6: park red 6 type2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With