Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Gremlin: What's an efficient way of finding an edge between two vertices?

So obviously, a straight forward way to find an edge between two vertices is to:

graph.traversal().V(outVertex).bothE(edgeLabel).filter(__.otherV().is(inVertex))

I feel that filter step will have to iterate through all edges making really slow for some applications with a lot of edges.

Another way could be:

traversal = graph.traversal()
                  .V(outVertex)
                  .bothE(edgeLabel)
                  .as("x")
                  .otherV()
                  .is(outVertex) // uses index?
                  .select("x");

I'm assuming the second approach could be much faster since it will be using ID index which will make it faster than the first the approach.

Which one is faster and more efficient (in terms of IO)?

I'm using Titan, so you could also make your answer Titan specific.

Edit

In terms of time, seems like the first approach is faster (edges were 20k for vertex b

gremlin> clock(100000){g.V(b).bothE().filter(otherV().is(a))}
==>0.0016451789999999999
gremlin> clock(100000){g.V(b).bothE().as("x").otherV().is(a).select("x")}
==>0.0018231140399999999

How about IO?

like image 532
Mohamed Taher Alrefaie Avatar asked Mar 11 '16 15:03

Mohamed Taher Alrefaie


2 Answers

I would expect the first query to be faster. However, few things:

  1. None of the queries is optimal, since both of them enable path calculations. If you need to find a connection in both directions, then use 2 queries (I will give an example below)
  2. When you use clock(), be sure to iterate() your traversals, otherwise you'll only measure how long it takes to do nothing.

These are the queries I would use to find an edge in both directions:

g.V(a).outE(edgeLabel).filter(inV().is(b))
g.V(b).outE(edgeLabel).filter(inV().is(a))

If you expect to get at most one edge:

edge = g.V(a).outE(edgeLabel).filter(inV().is(b)).tryNext().orElseGet {
       g.V(b).outE(edgeLabel).filter(inV().is(a)).tryNext()
}

This way you get rid of path calculations. How those queries perform will pretty much depend on the underlying graph database. Titan's query optimizer recognizes that query pattern and should return a result in almost no time.

Now if you want to measure the runtime, do this:

clock(100) {
  g.V(a).outE(edgeLabel).filter(inV().is(b)).iterate()
  g.V(b).outE(edgeLabel).filter(inV().is(a)).iterate()
}
like image 138
Daniel Kuppitz Avatar answered Oct 17 '22 19:10

Daniel Kuppitz


In case one does not know the vertex Id's, another solution might be

g.V().has('propertykey','value1').outE('thatlabel').as('e').inV().has('propertykey','value2').select('e')

This is also only unidirectional so one needs to reformulate the query for the opposite direction.

like image 21
Christoph Möbius Avatar answered Oct 17 '22 19:10

Christoph Möbius