Using Titan w/ Cassandra v 0.3.1, I created a vertex key index via createKeyIndex
as described in the Titan docs.
gremlin> g.createKeyIndex("my_key", Vertex.class)
==>null
I now have appx 50k nodes and 186k edges in the graph, and I'm finding a significant performance difference between lookups using my_key
. This query takes about 5 seconds to run:
gremlin> g.V.has("my_key", "abc")
==>v[12345]
whereas using the index ID takes less than 1 second:
gremlin> g.v(12345)
==>v[12345]
my_key
does not have a unique constraint (I don't want to), but I'm wondering what is causing such a discrepancy in performance. How can I increase performance on lookups for a non-unique, indexed vertex key?
The issue here is the use of .has
, which is a filter function and will not use any indexes. From GremlinDocs:
It is worth noting that the syntax of
has
is similar tog.V("name", "marko")
, which has the difference of being a key index lookup and as such will perform faster. In contrast, this line,g.V.has("name", "marko")
, will iterate over all vertices checking the name property of each vertex for a match and will be significantly slower than the key index approach.
For the example above, this will use the index and perform the lookup very quickly (< 1 second):
gremlin> g.V("my_key", "abc")
==>v[12345]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With