I have a component list made of 3 columns: product, component and quantity of component used:
a <- structure(list(prodName = c("prod1", "prod1", "prod2", "prod3",
"prod3", "int1", "int1", "int2", "int2"), component = c("a",
"int1", "b", "b", "int2", "a", "b", "int1", "d"), qty = c(1L,
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L)), row.names = c(NA, -9L), class = c("data.table",
"data.frame"))
prodName component qty
1 prod1 a 1
2 prod1 int1 2
3 prod2 b 3
4 prod3 b 4
5 prod3 int2 5
6 int1 a 6
7 int1 b 7
8 int2 int1 8
9 int2 d 9
Products with names starting with prod
are final products, those with names like int
are intermediate products, and those with letters are raw materials.
I need the full component list of final products with only raw materials as components. That is, I want to convert any int
into raw materials.
For this example, my expected result is (I explicitly stated the computation of the resulting number):
prodName |component |qty
prod1 |a |1+2*6 = 13
prod1 |b |0+2*7 = 14
prod2 |b |3
prod3 |b |4+5*8*7 = 284
prod3 |a |0+5*8*6 = 240
prod3 |d |0+5*9 = 45
I solved this by creating a very cumbersome sequence of joins with merge
. While this approach worked for the toy data, it's unlikely I can apply it to the real one.
#load data.table
library(data.table)
# split the tables between products and different levels of intermediate
a1 <- a[prodName %like% "prod",]
b1 <- a[prodName %like% "int1",]
c1 <- a[prodName %like% "int2",]
# convert int2 to raw materials
d1 <- merge(c1,
b1,
by.x = "component",
by.y = "prodName",
all.x = TRUE)[
is.na(component.y),
component.y := component][
is.na(qty.y),
qty.y := 1][,
.(prodName, qty = qty.x*qty.y),
by = .(component = component.y)]
# Since int1 is already exploded into raw materials, rbind both tables:
d1 <- rbind(d1, b1)
# convert all final products into raw materials, except that the raw mats that go directly into the product won't appear:
e1 <- merge(a1,
d1,
by.x = "component",
by.y = "prodName",
all.x = TRUE)
# rbind the last calculated raw mats (those coming from intermediate products) with those coming _directly_ into the final product:
result <- rbind(e1[!is.na(qty.y),
.(prodName, qty = qty.x * qty.y),
by = .(component = component.y)],
e1[is.na(qty.y),
.(prodName, component, qty = qty.x)])[,
.(qty = sum(qty)),
keyby = .(prodName, component)]
I'm aware I can split the data into tables and perform joins until every intermediate product is expressed as composed by only raw materials, but as mentioned above, that will be a last resort due to the size of data and levels of recursion of intermediate products.
Is there an easier / better way to do this sort of recursive join?
Recursive joins are sometimes also called “fixed-point joins”. They are used to obtain the parent-child data. In SQL Recursive joins are implemented with recursive common table expressions. Recursive common table expression (CTEs) is a way to reference a query over and over again.
For example, if a database of family relationships is to be searched, and the record for each person has "mother" and "father" fields, a recursive join would be one way to retrieve all of a person's known ancestors: first the person's direct parents' records would be retrieved, then the parents' information would be ...
A self join is a join in which a table is joined with itself (which is also called Unary relationships), especially when the table has a FOREIGN KEY which references its own PRIMARY KEY. To join a table itself means that each row of the table is combined with itself and with every other row of the table.
SELF JOIN syntax To perform a SELF JOIN in SQL, the LEFT or INNER JOIN is usually used. SELECT column_names FROM Table1 t1 [INNER | LEFT] JOIN Table1 t2 ON join_predicate; Note: t1 and t2 are different table aliases for the same table. You can also create the SELF JOIN with the help of the WHERE clause.
Essentially, your data represents a weighted edgelist in a directed graph. The below code directly calculates the sum of (product) distances over each simple path from raw component -> final product using the igraph
library:
library(igraph)
## transform edgelist into graph
graph <- graph_from_edgelist(as.matrix(a[, c(2, 1)])) %>%
set_edge_attr("weight", value = unlist(a[, 3]))
## combinations raw components -> final products
out <- expand.grid(prodname = c("prod1", "prod2", "prod3"), component = c("a", "b", "d"), stringsAsFactors = FALSE)
## calculate quantities
out$qty <- mapply(function(component, prodname) {
## all simple paths from component -> prodname
all_paths <- all_simple_paths(graph, from = component, to = prodname)
## if simple paths exist, sum over product of weights for each path
ifelse(length(all_paths) > 0,
sum(sapply(all_paths, function(path) prod(E(graph, path = path)$weight))), 0)
}, out$component, out$prodname)
out
#> prodname component qty
#> 1 prod1 a 13
#> 2 prod2 a 0
#> 3 prod3 a 240
#> 4 prod1 b 14
#> 5 prod2 b 3
#> 6 prod3 b 284
#> 7 prod1 d 0
#> 8 prod2 d 0
#> 9 prod3 d 45
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With