Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R data.table - update each row by aggregating other rows satisfying a condition

Tags:

r

data.table

I have the following table.

dt = data.table(id = 1:5, intMask = c(11,14,8,1,13), imprint = c("1011", "1110", "1000", "0001", "1101"), N = c(3,3,1,1,3), mass = c(.05,.1,.15,.3,.4))

   id intMask imprint N mass
1:  1      11    1011 3 0.05
2:  2      14    1110 3 0.10
3:  3       8    1000 1 0.15
4:  4       1    0001 1 0.30
5:  5      13    1101 3 0.40

Assume that the imprint column represents a binary representation of a set (i.e. here we have subsets of a set of cardinality 5). intMask represents the respective integer corresponding to the binary representation. N respective cardinality - i.e. number of 1s in the representation. I would like to update the sum by summating all rows corresponding to respective supersets. I propose using the bitwAnd() function with column intMask to find respective supersets efficiently.

for(i in 1:nrow(dt)) {
  i.intMask <- dt[i,intMask]
  i.N <- dt[i,N]
  dt[i, newMass := sum(dt[N >= i.N,][bitwAnd(intMask, i.intMask) == i.intMask, mass])]
}

I.e. to get

dt[]
   id intMask imprint N mass newMass
1:  1      11    1011 3 0.05    0.05
2:  2      14    1110 3 0.10    0.10
3:  3       8    1000 1 0.15    0.70
4:  4       1    0001 1 0.30    0.75
5:  5      13    1101 3 0.40    0.04

Assume thousands of rows. Do you have an idea of how to do it efficiently? Preferably using data.table updating?

like image 530
Vaclav Kratochvíl Avatar asked Oct 25 '25 23:10

Vaclav Kratochvíl


1 Answers

This might be one option

dt[
    dt[
        dt,
        c(
            id = .(i.id),
            newMass = .(mass * (bitwAnd(intMask, i.intMask) == i.intMask))
        ),
        on = .(N >= N)
    ][, lapply(.SD, sum), id],
    on = .(id)
]

which gives

      id intMask imprint     N  mass newMass
   <int>   <num>  <char> <num> <num>   <num>
1:     1      11    1011     3  0.05    0.05
2:     2      14    1110     3  0.10    0.10
3:     3       8    1000     1  0.15    0.70
4:     4       1    0001     1  0.30    0.75
5:     5      13    1101     3  0.40    0.40
like image 171
ThomasIsCoding Avatar answered Oct 28 '25 13:10

ThomasIsCoding