Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to calculate aggregates for sub-trees in Gremlin?

I have a tree with many levels, where leaf nodes might have property "count". I want to calculate total count for each sub-tree, and cache those values in the root node of each sub-tree. Is that possible in Gremlin?

like image 719
isobretatel Avatar asked Sep 18 '15 20:09

isobretatel


1 Answers

You could do it with a sideEffect - that's pretty straightforward. We setup a simple tree with:

gremlin> g = new TinkerGraph()                                                                 
==>tinkergraph[vertices:0 edges:0]
gremlin> v1 = g.addVertex()                                                                    
==>v[0]
gremlin> v2 = g.addVertex()                                                                    
==>v[1]
gremlin> v3 = g.addVertex([count:2])                                                           
==>v[2]
gremlin> v4 = g.addVertex([count:3])                                                           
==>v[3]
gremlin> v1.addEdge('child',v2)                                                                
==>e[4][0-child->1]
gremlin> v1.addEdge('child',v3)                                                                
==>e[5][0-child->2]
gremli                                                                                         
gremlin> v2.addEdge('child',v4)
==>e[6][1-child->3]

And then here's the calculation over each subtree within the full tree:

gremlin> g.V().filter{it.outE().hasNext()}.sideEffect{                                           
gremlin>   c=0;                                                                                  
gremlin>   it.as('a').out().sideEffect{leaf -> c+=(leaf.getProperty('count')?:0)}.loop('a'){true}.iterate()
gremlin>   it.setProperty('total',c)                                                                       
gremlin> }                                                                                                 
==>v[0]
==>v[1]
gremlin> g.v(0).total
==>5
gremlin> g.v(1).total                                                                                      
==>3

That query breaks down like this. First, this piece:

g.V().filter{it.outE().hasNext()}

gets any portion of the tree that is not a leaf node (i.e. should have at least one outgoing edge to not be a leaf). Second, we use sideEffect to process each root of a subtree:

it.as('a').out().sideEffect{leaf -> c+=(leaf.getProperty('count')?:0)}.loop('a'){true}.iterate()

storing the sum of the "count" property for each subtree in a variable called c. There's a bit of groovy goodness there with the elvis operator (?:) to check for vertices without a "count" property and return a zero in those cases. After you traverse the tree to calculate c you can just store the value of c in your root node of the subtree via:

it.setProperty('total',c) 
like image 130
stephen mallette Avatar answered Nov 12 '22 03:11

stephen mallette