Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Recursive query with sub-graph aggregation (arbitrary depth)

I asked a question earlier about aggregating quantities along a graph. The two answers provided worked well, but now I am trying to extend the Cypher query it to a graph of variable depth.

To summarize we start of with a bunch of leaf stores which all are associated with a particular supplier, which is a property on the Store node. Inventory is then moved along to other stores and the proportion from each supplier corresponds to their contribution to the original store.

So for node B02, S2 contributed 750/1250 = 60% and S3 contributed 40%. We then move 600 units our of B02 of which 60% belongs to S2 and 40% to S3 and so on.

enter image description here

What we want to know what percentage of the final 700 units into D01 belong to each supplier. Where suppliers with the same name are the same supplier. So for the above graph we expect:

S1, 38.09
S2, 27.61
S3, 34.28

I've prepared a graph using this Cypher script:

CREATE (A01:Store {Name: 'A01', Supplier: 'S1'})
CREATE (A02:Store {Name: 'A02', Supplier: 'S1'})
CREATE (A03:Store {Name: 'A03', Supplier: 'S2'})
CREATE (A04:Store {Name: 'A04', Supplier: 'S3'})
CREATE (A05:Store {Name: 'A05', Supplier: 'S1'})
CREATE (A06:Store {Name: 'A06', Supplier: 'S1'})
CREATE (A07:Store {Name: 'A07', Supplier: 'S2'})
CREATE (A08:Store {Name: 'A08', Supplier: 'S3'})

CREATE (B01:Store {Name: 'B01'})
CREATE (B02:Store {Name: 'B02'})
CREATE (B03:Store {Name: 'B03'})
CREATE (B04:Store {Name: 'B04'})

CREATE (C01:Store {Name: 'C01'})
CREATE (C02:Store {Name: 'C02'})

CREATE (D01:Store {Name: 'D01'})

CREATE (A01)-[:MOVE_TO {Quantity: 750}]->(B01)
CREATE (A02)-[:MOVE_TO {Quantity: 500}]->(B01)
CREATE (A03)-[:MOVE_TO {Quantity: 750}]->(B02)
CREATE (A04)-[:MOVE_TO {Quantity: 500}]->(B02)
CREATE (A05)-[:MOVE_TO {Quantity: 100}]->(B03)
CREATE (A06)-[:MOVE_TO {Quantity: 200}]->(B03)
CREATE (A07)-[:MOVE_TO {Quantity: 50}]->(B04)
CREATE (A08)-[:MOVE_TO {Quantity: 450}]->(B04)

CREATE (B01)-[:MOVE_TO {Quantity: 400}]->(C01)
CREATE (B02)-[:MOVE_TO {Quantity: 600}]->(C01)
CREATE (B03)-[:MOVE_TO {Quantity: 100}]->(C02)
CREATE (B04)-[:MOVE_TO {Quantity: 200}]->(C02)

CREATE (C01)-[:MOVE_TO {Quantity: 500}]->(D01)
CREATE (C02)-[:MOVE_TO {Quantity: 200}]->(D01)

The current query is this:

MATCH (s:Store { Name:'D01' })
MATCH (s)<-[t:MOVE_TO]-()<-[r:MOVE_TO]-(supp)
WITH t.Quantity as total, collect(r) as movements
WITH total, movements, reduce(totalSupplier = 0, r IN movements | totalSupplier + r.Quantity) as supCount
UNWIND movements as movement
RETURN startNode(movement).Supplier as Supplier, round(100.0*movement.Quantity/supCount) as pct

I am trying to use recursive relationships, something along the lines of this:

MATCH (s)<-[t:MOVE_TO]-()<-[r:MOVE_TO*]-(supp)

however that gives multiple paths to the end node and I need to aggregate the inventory at each node I think.

like image 357
NeddySpaghetti Avatar asked Jan 21 '15 05:01

NeddySpaghetti


2 Answers

As i said before i enjoyed this question. I know you already accepted an answer, but I decided to post my final response as it also returns the percentile without client effort ( which means you can also do a SET on the nodes to update the value in the db when you need to ) and of course if for any other reason as one i can come back to :) here is the link to the console example

it returns a row with the store name, sum moved to it from all suppliers and the percentile of each supplier

MATCH p =s<-[:MOVE_TO*]-sup
WHERE HAS (sup.Supplier) AND NOT HAS (s.Supplier)
WITH s,sup,reduce(totalSupplier = 0, r IN relationships(p)| totalSupplier + r.Quantity) AS TotalAmountMoved
WITH sum(TotalAmountMoved) AS sumMoved, collect(DISTINCT ([sup.Supplier, TotalAmountMoved])) AS MyDataPart1,s
WITH reduce(b=[], c IN MyDataPart1| b +[{ Supplier: c[0], Quantity: c[1], Percentile: ((c[1]*1.00))/(sumMoved*1.00)*100.00 }]) AS MyData, s, sumMoved
RETURN s.Name, sumMoved, MyData
like image 150
cechode Avatar answered Sep 23 '22 16:09

cechode


This query generates the correct results for any arbitrary graph that conforms to the model described in the question. (When Store x moves merchandise to Store y, it is assumed that the Supplier percentages of the moved merchandise is the same as for Store x.)

However, this solution does not consist of just a single Cypher query (since that may not be possible). Instead, it involves multiple queries, one of which must be iterated until the calculations cascade through the entire graph of Store nodes. That iterated query will clearly tell you when to stop iterating. The other Cypher queries are needed to: prepare the graph for iteration, report the Supplier percentages for the "end" node(s), and clean up the graph (so that it is restored to the way it was before step 1, below).

These queries could probably be further optimized.

Here are the required steps:

  1. Prepare the graph for the iterative query (initializes the temporary pcts array for all starting Store nodes). This includes the creation of a singleton Suppliers node that has an array with all the supplier names. This is used to establish the order of the elements of the temporary pcts arrays, and to map those elements back to the correct supplier name.

    MATCH (store:Store)
    WHERE HAS (store.Supplier)
    WITH COLLECT(store) AS stores, COLLECT(DISTINCT store.Supplier) AS csup
    CREATE (sups:Suppliers { names: csup })
    WITH stores, sups
    UNWIND stores AS store
    SET store.pcts =
      EXTRACT(i IN RANGE(0,LENGTH(sups.names)-1,1) |
        CASE WHEN store.Supplier = sups.names[i] THEN 1.0 ELSE 0.0 END)
    RETURN store.Name, store.Supplier, store.pcts;
    

    Here is the result with the question's data:

    +---------------------------------------------+
    | store.Name | store.Supplier | store.pcts    |
    +---------------------------------------------+
    | "A01"      | "S1"           | [1.0,0.0,0.0] |
    | "A02"      | "S1"           | [1.0,0.0,0.0] |
    | "A03"      | "S2"           | [0.0,1.0,0.0] |
    | "A04"      | "S3"           | [0.0,0.0,1.0] |
    | "A05"      | "S1"           | [1.0,0.0,0.0] |
    | "A06"      | "S1"           | [1.0,0.0,0.0] |
    | "A07"      | "S2"           | [0.0,1.0,0.0] |
    | "A08"      | "S3"           | [0.0,0.0,1.0] |
    +---------------------------------------------+
    8 rows
    83 ms
    Nodes created: 1
    Properties set: 9
    
  2. Iterative query (run repeatedly until 0 rows are returned)

    MATCH p=(s1:Store)-[m:MOVE_TO]->(s2:Store)
    WHERE HAS(s1.pcts) AND NOT HAS(s2.pcts)
    SET s2.pcts = EXTRACT(i IN RANGE(1,LENGTH(s1.pcts),1) | 0)
    WITH s2, COLLECT(p) AS ps
    WITH s2, ps, REDUCE(s=0, p IN ps | s + HEAD(RELATIONSHIPS(p)).Quantity) AS total
    FOREACH(p IN ps |
      SET HEAD(RELATIONSHIPS(p)).pcts = EXTRACT(parentPct IN HEAD(NODES(p)).pcts | parentPct * HEAD(RELATIONSHIPS(p)).Quantity / total)
    )
    FOREACH(p IN ps |
      SET s2.pcts = EXTRACT(i IN RANGE(0,LENGTH(s2.pcts)-1,1) | s2.pcts[i] + HEAD(RELATIONSHIPS(p)).pcts[i])
    )
    RETURN s2.Name, s2.pcts, total, EXTRACT(p IN ps | HEAD(RELATIONSHIPS(p)).pcts) AS rel_pcts;
    

    Iteration 1 result:

    +-----------------------------------------------------------------------------------------------+
    | s2.Name | s2.pcts       | total | rel_pcts                                                    |
    +-----------------------------------------------------------------------------------------------+
    | "B04"   | [0.0,0.1,0.9] | 500   | [[0.0,0.1,0.0],[0.0,0.0,0.9]]                               |
    | "B01"   | [1.0,0.0,0.0] | 1250  | [[0.6,0.0,0.0],[0.4,0.0,0.0]]                               |
    | "B03"   | [1.0,0.0,0.0] | 300   | [[0.3333333333333333,0.0,0.0],[0.6666666666666666,0.0,0.0]] |
    | "B02"   | [0.0,0.6,0.4] | 1250  | [[0.0,0.6,0.0],[0.0,0.0,0.4]]                               |
    +-----------------------------------------------------------------------------------------------+
    4 rows
    288 ms
    Properties set: 24
    

    Iteration 2 result:

    +-------------------------------------------------------------------------------------------------------------------------------+
    | s2.Name | s2.pcts                                      | total | rel_pcts                                                     |
    +-------------------------------------------------------------------------------------------------------------------------------+
    | "C02"   | [0.3333333333333333,0.06666666666666667,0.6] | 300   | [[0.3333333333333333,0.0,0.0],[0.0,0.06666666666666667,0.6]] |
    | "C01"   | [0.4,0.36,0.24]                              | 1000  | [[0.4,0.0,0.0],[0.0,0.36,0.24]]                              |
    +-------------------------------------------------------------------------------------------------------------------------------+
    2 rows
    193 ms
    Properties set: 12
    

    Iteration 3 result:

    +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | s2.Name | s2.pcts                                                       | total | rel_pcts                                                                                                                    |
    +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | "D01"   | [0.38095238095238093,0.27619047619047615,0.34285714285714286] | 700   | [[0.2857142857142857,0.2571428571428571,0.17142857142857143],[0.09523809523809522,0.01904761904761905,0.17142857142857143]] |
    +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    1 row
    40 ms
    Properties set: 6
    

    Iteration 4 result:

    +--------------------------------------+
    | s2.Name | s2.pcts | total | rel_pcts |
    +--------------------------------------+
    +--------------------------------------+
    0 rows
    69 ms
    
  3. List the non-zero Supplier percentages for the ending Store node(s).

    MATCH (store:Store), (sups:Suppliers)
    WHERE NOT (store:Store)-[:MOVE_TO]->(:Store) AND HAS(store.pcts)
    RETURN store.Name, [i IN RANGE(0,LENGTH(sups.names)-1,1) WHERE store.pcts[i] > 0 | {supplier: sups.names[i], pct: store.pcts[i] * 100}] AS pcts;
    

    Result:

    +----------------------------------------------------------------------------------------------------------------------------------+
    | store.Name | pcts                                                                                                                |
    +----------------------------------------------------------------------------------------------------------------------------------+
    | "D01"      | [{supplier=S1, pct=38.095238095238095},{supplier=S2, pct=27.619047619047617},{supplier=S3, pct=34.285714285714285}] |
    +----------------------------------------------------------------------------------------------------------------------------------+
    1 row
    293 ms
    
  4. Clean up (remove all the temporary pcts props and the Suppliers node).

    MATCH (s:Store), (sups:Suppliers)
    OPTIONAL MATCH (s)-[m:MOVE_TO]-()
    REMOVE m.pcts, s.pcts
    DELETE sups;
    

    Result:

    0 rows
    203 ms
    +-------------------+
    | No data returned. |
    +-------------------+
    Properties set: 29
    Nodes deleted: 1
    
like image 24
cybersam Avatar answered Sep 23 '22 16:09

cybersam