"recursive" self join in data.table

Tags:

I have a component list made of 3 columns: product, component and quantity of component used:

a <- structure(list(prodName = c("prod1", "prod1", "prod2", "prod3", 
"prod3", "int1", "int1", "int2", "int2"), component = c("a", 
"int1", "b", "b", "int2", "a", "b", "int1", "d"), qty = c(1L, 
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L)), row.names = c(NA, -9L), class = c("data.table", 
"data.frame"))

  prodName component qty
1    prod1         a   1
2    prod1      int1   2
3    prod2         b   3
4    prod3         b   4
5    prod3      int2   5
6     int1         a   6
7     int1         b   7
8     int2      int1   8
9     int2         d   9

Products with names starting with prod are final products, those with names like int are intermediate products, and those with letters are raw materials.

I need the full component list of final products with only raw materials as components. That is, I want to convert any int into raw materials.

Intermediate products can be composed by raw materials and another intermediate products, hence my reference to "recursive".
I can't know in advance the level of nesting / recursion of an intermediate product (2 levels in this example, in excess of 6 in actual data).

For this example, my expected result is (I explicitly stated the computation of the resulting number):

prodName  |component  |qty
prod1     |a          |1+2*6 = 13
prod1     |b          |0+2*7 = 14
prod2     |b          |3
prod3     |b          |4+5*8*7 = 284
prod3     |a          |0+5*8*6 = 240
prod3     |d          |0+5*9 = 45

What I have done:

I solved this by creating a very cumbersome sequence of joins with merge. While this approach worked for the toy data, it's unlikely I can apply it to the real one.

#load data.table
library(data.table)

# split the tables between products and different levels of intermediate
a1 <- a[prodName %like% "prod",]
b1 <- a[prodName %like% "int1",]
c1 <- a[prodName %like% "int2",]

# convert int2 to raw materials
d1 <- merge(c1, 
            b1, 
            by.x = "component", 
            by.y = "prodName", 
            all.x = TRUE)[
              is.na(component.y),
              component.y := component][
                is.na(qty.y),
                qty.y := 1][,
                                .(prodName, qty = qty.x*qty.y),
                                by = .(component = component.y)]

# Since int1 is already exploded into raw materials, rbind both tables:
d1 <- rbind(d1, b1)

# convert all final products into raw materials, except that the raw mats that go directly into the product won't appear:
e1 <- merge(a1, 
            d1, 
            by.x = "component", 
            by.y = "prodName", 
            all.x = TRUE)

# rbind the last calculated raw mats (those coming from intermediate products) with those coming _directly_ into the final product:
result <- rbind(e1[!is.na(qty.y), 
                   .(prodName, qty = qty.x * qty.y), 
                   by = .(component = component.y)], 
                e1[is.na(qty.y), 
                   .(prodName, component, qty = qty.x)])[, 
                                                         .(qty = sum(qty)), 
                                                         keyby = .(prodName, component)]

I'm aware I can split the data into tables and perform joins until every intermediate product is expressed as composed by only raw materials, but as mentioned above, that will be a last resort due to the size of data and levels of recursion of intermediate products.

Is there an easier / better way to do this sort of recursive join?

483

asked Jun 30 '19 02:06

PavoDive

1 Answers

Essentially, your data represents a weighted edgelist in a directed graph. The below code directly calculates the sum of (product) distances over each simple path from raw component -> final product using the igraph library:

library(igraph)

## transform edgelist into graph
graph <- graph_from_edgelist(as.matrix(a[, c(2, 1)])) %>%
  set_edge_attr("weight", value = unlist(a[, 3]))

## combinations raw components -> final products
out <- expand.grid(prodname = c("prod1", "prod2", "prod3"), component = c("a", "b", "d"), stringsAsFactors = FALSE)

## calculate quantities
out$qty <- mapply(function(component, prodname) {

  ## all simple paths from component -> prodname
  all_paths <- all_simple_paths(graph, from = component, to = prodname)

  ## if simple paths exist, sum over product of weights for each path
  ifelse(length(all_paths) > 0,
         sum(sapply(all_paths, function(path) prod(E(graph, path = path)$weight))), 0)

}, out$component, out$prodname)

out
#>   prodname component qty
#> 1    prod1         a  13
#> 2    prod2         a   0
#> 3    prod3         a 240
#> 4    prod1         b  14
#> 5    prod2         b   3
#> 6    prod3         b 284
#> 7    prod1         d   0
#> 8    prod2         d   0
#> 9    prod3         d  45

answered Sep 28 '22 17:09

Joris C.

Related questions
                            
                                Group geom_point with the geom_polygon
                            
                                Using file.exist in R
                            
                                dcast fails to cast character column when the data size is large
                            
                                Keep points in gganimate
                            
                                Installing tidyverse on Ubuntu 18.x & R 3.4.4/3.5.1
                            
                                R - finding pattern in a column and replacing it (more efficient solution)
                            
                                How to extract stan code from rstanarm object
                            
                                create a matrix in `R` and each element in that matrix is another matrix
                            
                                Function parameter; passing variable name without quotes
                            
                                Make Y-axis start at 1 instead of 0 within ggplot bar chart
                            
                                Is there a way to make a kable without lines/borders for pdf?
                            
                                Icons in data table in Shiny
                            
                                join data frames and replace one column with another
                            
                                How to fix an error when adding a manual scale in ggplot?
                            
                                How to change alpha in geom_sf?
                            
                                In R: How to replace NA in a Vector found between two integers
                            
                                autoplot does not accept ts object
                            
                                How to stop ggrepel labels moving between gganimate frames in R/ggplot2?
                            
                                Mutate_if or mutate_at in dplyr with Dates
                            
                                How to generate README.md from README.Rmd for R package?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

"recursive" self join in data.table

Tags:

join

r

recursion

data.table

self-join

What I have done:

PavoDive

People also ask

1 Answers

Joris C.

Recent Activity

Donate For Us