Our service runs a lot of concurrent updates and deletions of vertices in Janusgraph.
Sometimes we got odd vertices that has vertex label and not all related properties and edges. Sometimes this vertices have only one property (of three mandatory) or one edge and none of the properties. From our business logic this vertex looks like inconsistent and corrupted. Looking through the service logs I can't see specific errors or something abnormal related to this vertex ids.
Trying to remove such vertices
2020-12-23 15:57:09 ERROR StandardJanusGraph:750 - Could not commit transaction [2] due to storage exception in commit
org.janusgraph.core.JanusGraphException: Could not execute operation due to backend exception
Searching the Janusgraph documentation I found the notion of ghost vertices
When the same vertex is concurrently removed in one transaction and modified in another, both transactions will successfully commit on eventually consistent storage backends and the vertex will still exist with only the modified properties or edges. This is referred to as a ghost vertex.
Also I found that there is GhostVertexRemover class in Janusgraph repository that intended to be run to remove such vertexes.
Still have doubts if corrupted vertices that we have are ghost vertices described in documentation.
Have invested decent time into the investigation of this issue and here is the conclusions
First of all we did face ghost vertices. Usually they don't have label if you graph has static labels. ("vertex" label is provided dynamically by the code of Janusgraph). Ghost vertex will contain only those elements (properties or edges) that were updated during collision with concurrent delete of this vertex.
Original documentation states the same but in more concise way
the vertex will still exist with only the modified properties or edges
Here is the test that easily reproduces this issue
How to mitigate the issue?
Janusgraph documentation offers 2 options:
GhostVertexRemover jobA more scalable approach is to allow ghost vertices temporarily and clearing them out in regular time intervals.
Another option is to detect them at read-time using the option checkInternalVertexExistence() documented in Transaction
Both of this options do not fit our needs.
checkInternalVertexExistence() method is not accessible via tinkerpop transactionsSolution
No more ghost vertices!
This workaround helped us to resolve this issue with ghost vertices and didn't add any significant overhead in terms of code or performance. You should add version property with lock consistency to all potentially ghost vertices (that may be concurrently updated and deleted). That property should be updated each time the vertex is modified (property or edge added)
Define schema for VERSION property and SOURCE vertex
PropertyKey versionPropertyKey = management.makePropertyKey(VERSION).dataType(Long.class).make();
management.setConsistency(versionPropertyKey , ConsistencyModifier.LOCK);
management.addProperties(management.getVertexLabel(SOURCE), versionPropertyKey);
Increment version property while adding edge
Edge edge = traversal.addE(RELATION).from(traversal.V(sourceVertexId).property(VERSION, sourceVersion++))
.to(traversal.V(targetVertexId).property(VERSION, targetVersion++)).next();
Increment version while updating vertex properties
traversal.V(vertexId).next().property(VERSION, version++).property(OID, oidPropertyValue);
Here you can find above mentioned examples that fixes this issue
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With