Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Titan lookups on indexed key are incredibly slow?

Tags:

gremlin

titan

Using Titan w/ Cassandra v 0.3.1, I created a vertex key index via createKeyIndex as described in the Titan docs.

gremlin> g.createKeyIndex("my_key", Vertex.class)
==>null

I now have appx 50k nodes and 186k edges in the graph, and I'm finding a significant performance difference between lookups using my_key. This query takes about 5 seconds to run:

gremlin> g.V.has("my_key", "abc")
==>v[12345]

whereas using the index ID takes less than 1 second:

gremlin> g.v(12345)
==>v[12345]

my_key does not have a unique constraint (I don't want to), but I'm wondering what is causing such a discrepancy in performance. How can I increase performance on lookups for a non-unique, indexed vertex key?

like image 228
bcm360 Avatar asked Jun 17 '13 12:06

bcm360


1 Answers

The issue here is the use of .has, which is a filter function and will not use any indexes. From GremlinDocs:

It is worth noting that the syntax of has is similar to g.V("name", "marko"), which has the difference of being a key index lookup and as such will perform faster. In contrast, this line, g.V.has("name", "marko"), will iterate over all vertices checking the name property of each vertex for a match and will be significantly slower than the key index approach.

For the example above, this will use the index and perform the lookup very quickly (< 1 second):

gremlin> g.V("my_key", "abc")
==>v[12345]
like image 177
bcm360 Avatar answered Nov 11 '22 02:11

bcm360