Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Titan Db ignoring index

I have a graph with a couple of indices. They're two composite indices with label restraints. (both are exactly the same just on different properties/labels). One definitely seems to work but the other doesn't. I've done the following profile() to doubled check:

One is called KeyOnNode : property uid and label node :

gremlin> g.V().hasLabel("node").has("uid", "xxxxxxxx").profile().cap(...)
==>Traversal Metrics
Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
TitanGraphStep([~label.eq(node), uid.eq(dammit_...                     1           1           2.565    96.84
  optimization                                                                                 1.383
  backend-query                                                        1                       0.231
SideEffectCapStep([~metrics])                                          1           1           0.083     3.16
                                            >TOTAL                     -           -           2.648        -

The above is perfectly acceptable and works well. I'm assuming the magic line is backend-query.

The other is called NameOnSuperNode : property name and label supernode:

gremlin> g.V().hasLabel("supernode").has("name", "xxxxxxxx").profile().cap(...)
==>Traversal Metrics
Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
TitanGraphStep([~label.eq(supernode), name.eq(n...                     1           1        5763.163   100.00
  optimization                                                                                 2.261
  scan                                                                                         0.000
SideEffectCapStep([~metrics])                                          1           1           0.073     0.00
                                            >TOTAL                     -           -        5763.236        -

Here the query takes an outrageous amount of time and we have a scan line. I originally wondered if the index wasn't commit through the management system but alas the following seems to work just fine :

gremlin> m = graphT.openManagement(); 
==>com.thinkaurelius.titan.graphdb.database.management.ManagementSystem@73c1c105
gremlin> index = m.getGraphIndex("NameOnSuperNode")
==>NameOnSuperNode
gremlin> index.getFieldKeys()
==>name
gremlin> import static com.thinkaurelius.titan.graphdb.types.TypeDefinitionCategory.*
==>null
gremlin> sv = m.getSchemaVertex(index)
==>NameOnSuperNode
gremlin> rel = sv.getRelated(INDEX_SCHEMA_CONSTRAINT, Direction.OUT)
==>com.thinkaurelius.titan.graphdb.types.SchemaSource$Entry@26b2b8e2
gremlin> sse = rel.iterator().next()
==>com.thinkaurelius.titan.graphdb.types.SchemaSource$Entry@2d39a135
gremlin> sse.getSchemaType()
==>supernode

I can't just reset the db at this point. Any help pinpointing what the issues could be would be amazing, I'm hitting a wall here. Is this a sign that I need to reindex?

INFO: Titan DB 1.1 (TP 3.1.1)

Cheers

UPDATE : I've found that the index in question is not in a REGISTERED state:

gremlin> :> m = graphT.openManagement(); index = m.getGraphIndex("NameOnSuperNode"); pkey = index.getFieldKeys()[0]; index.getIndexStatus(pkey)
==>INSTALLED

How do I get it to register? I've tried m.updateIndex(index, SchemaAction.REGISTER_INDEX).get(); m.commit(); graphT.tx().commit(); but it doesn't seem to do anything

UPDATE 2 : I've tried regitering the index in order to reindex with the following :

gremlin> m = graphT.openManagement(); 
index = m.getGraphIndex("NameOnSuperNode") ; 
import static com.thinkaurelius.titan.graphdb.types.TypeDefinitionCategory.*; 
import com.thinkaurelius.titan.graphdb.database.management.ManagementSystem; 
m.updateIndex(index, SchemaAction.REGISTER_INDEX).get();
ManagementSystem.awaitGraphIndexStatus(graphT, "NameOnSuperNode").status(SchemaStatus.REGISTERED).timeout(20, java.time.temporal.ChronoUnit.MINUTES).call();
m.commit();
graphT.tx().commit()

But this isn't working. I still have my index in the INSTALLED status and I'm still getting a timeout. I've checked that there were no open transactions. Anyone have an idea? FYI the graph is running on a single server and has ~100K vertices and ~130k edges.

like image 782
Pomme.Verte Avatar asked Jan 06 '23 01:01

Pomme.Verte


2 Answers

So there are a few things that can be happening here:

  1. If both of those indices you describe were not created in the same transaction (and the problem index in question was created in after the name propertyKey was already defined) then you should issue a reindex, as per Titan docs:

    The name of a graph index must be unique. Graph indexes built against newly defined property keys, i.e. property keys that are defined in the same management transaction as the index, are immediately available. Graph indexes built against property keys that are already in use require the execution of a reindex procedure to ensure that the index contains all previously added elements. Until the reindex procedure has completed, the index will not be available. It is encouraged to define graph indexes in the same transaction as the initial schema.

  2. The index may be timing out the process that takes to move from REGISTERED to INSTALLED, in which case you want to use mgmt.awaitGraphIndexStatus(). You can even specify the amount of time you are willing to wait here.

  3. Make sure there are no open transactions on your graph or the index status will indeed not change, as described here.

  4. This is clearly not the case for you, but there is a bug in Titan (fixed in JanusGraph via this PR) such that if you create an index against a newly created propertyKey as well as a previously used propertyKey, the index will get stuck in the REGISTERED state

  5. Indexes will not move to REGISTERED unless every Titan/JanusGraph node in the cluster acknowledges the index creation. If your indexes are getting stuck in the INSTALLED state, there is a chance that the other nodes in the system are not acknowledging the indexes existence. This can be due to issues with another server in the cluster, backfill in the messaging queue Titan/JanusGraph uses to talk with each other, or most unexpectedly: the existence of phantom instances. These can occur every time your server is killed through non-normal JVM shutdown processes, i.e. kill -9 the server due to it being stuck in thrash the world garbage collection. If you expect backfill to be the problem, the comments in this class offer good insight to customizable configuration options that may help fix the problem. To check for the existence of phantom nodes, use this function and then this function to kill the phantom instances.

like image 155
David Avatar answered Jan 07 '23 13:01

David


I think you missed config to your graph. If you used backend is cassandra, you must config with elasticsearch. If you used backend is hbase, you must config with caching. Read more in link below: https://docs.janusgraph.org/0.2.0/configuration.html

like image 27
Nguyễn Viết Việt Avatar answered Jan 07 '23 15:01

Nguyễn Viết Việt