Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

multiply two data.tables, keep all possibilities

Tags:

r

data.table

I cannot find a duplicate as of now.

My problem is the following:

I have two data.tables. One with two columns (featurea, count), another with three columns (featureb, featurec, count). I want to multiply (?), in order so that I have a new data.table with all the possibilities. The trick is that these features do not match, so therefore merge solutions might not do the trick.

MRE as follows:

# two columns
DT1 <- data.table(featurea =c("type1","type2"), count = c(2,3))

#       featurea count
#1:    type1     2
#2:    type2     3

#three columns
DT2 <- data.table(origin =c("house","park","park"), color =c("red","blue","red"),count =c(2,1,2))

#   origin color count
#1:  house   red     2
#2:   park  blue     1
#3:   park   red     2

My expected result, in this case, is a data.table as follows:

> DT3
   origin color featurea total
1:  house   red    type1     4
2:  house   red    type2     6
3:   park  blue    type1     2
4:   park  blue    type2     3
5:   park   red    type1     4
6:   park   red    type2     6
like image 783
erasmortg Avatar asked Dec 21 '16 12:12

erasmortg


2 Answers

Please test on larger data, I'm not sure how optimized this is:

DT2[, .(featurea = DT1[["featurea"]], 
        count = count * DT1[["count"]]), by = .(origin, color)]
#   origin color featurea count
#1:  house   red    type1     4
#2:  house   red    type2     6
#3:   park  blue    type1     2
#4:   park  blue    type2     3
#5:   park   red    type1     4
#6:   park   red    type2     6

It might be more efficient to switch it around if DT1 has less groups:

DT1[, c(DT2[, .(origin, color)], 
        .(count = count * DT2[["count"]])), by = featurea]
#   featurea origin color count
#1:    type1  house   red     4
#2:    type1   park  blue     2
#3:    type1   park   red     4
#4:    type2  house   red     6
#5:    type2   park  blue     3
#6:    type2   park   red     6
like image 77
Roland Avatar answered Nov 07 '22 14:11

Roland


This would be one way. First, I expanded rows in DT2 with expandRows() in the splitstackshape package. Each row is repeated twice since I specified count = 2, count.is.col = FALSE. Then, I took care of the multiplication and created a new column called total. At the same time, I created a new column for featurea. Finally, I dropped count.

library(data.table)
library(splitstackshape)

expandRows(DT2, count = nrow(DT1), count.is.col = FALSE)[,
    `:=` (total = count * DT1[, count], featurea = DT1[, featurea])][, count := NULL]

EDIT

If you do not want to add another package, you can try David's idea in his comment.

DT2[rep(1:.N, nrow(DT1))][,
   `:=`(total = count * DT1$count, featurea = DT1$featurea, count = NULL)][]



#   origin color total featurea
#1:  house   red     4    type1
#2:  house   red     6    type2
#3:   park  blue     2    type1
#4:   park  blue     3    type2
#5:   park   red     4    type1
#6:   park   red     6    type2
like image 6
jazzurro Avatar answered Nov 07 '22 14:11

jazzurro