When should I choose Neo4j’s traversal framework over Cypher?
For example, for a friend-of-a-friend query I would write a Cypher query as follows:
MATCH (p:Person {pid:'56'})-[:FRIEND*2..2]->(fof)
WHERE NOT (p)-[:FRIEND]->(fof)
RETURN fof.pid
And the corresponding Traversal implementation would require two traversals for friends_at_depth_1
and friends_at_depth_2
(or a core API call to get the relationships) and find the difference of these two sets using plain java constructs, outside of the traversal description. Correct me if I’m wrong here.
Any thoughts?
The Neo4j Traversal framework Java API is a callback-based, lazily-executed way of specifying desired movements through a graph in Java. Some traversal examples are collected under Traversal. You can also use the Cypher query language as a powerful declarative way to query the graph.
Cypher is Neo4j's graph query language that lets you retrieve data from the graph. It is like SQL for graphs, and was inspired by SQL so it lets you focus on what data you want out of the graph (not how to go get it).
Neo4j has some upper bound limit for the graph size and can support tens of billions of nodes, properties, and relationships in a single graph. No security is provided at the data level and there is no data encryption. Security auditing is not available in Neo4j.
100 depth originates in the nomenclature of the neography ruby gem, where this is done using the abstract depth method. In neo4j, this is called variable length relationships, as can be seen here in the documentation: MATCH / Variable length relationships.
The key thing to remember about Cypher vs. the traversal API is that the traversal API is an imperative way of accessing a graph, and Cypher is a declarative way of accessing a graph. You can read more about that difference here but the short version is that in imperative access, you're telling the database exactly how to go get the graph. (E.g. I want to do a depth first search, prune these branches, stop when I hit certain nodes, etc). In declarative graph query, you're instead specifying what you want, and you're delegating all aspects of how to get it to the Cypher implementation.
In your query, I'd slightly revise it:
MATCH (p:Person {pid:'56'})-[:FRIEND*2..2]->(fof)
WHERE NOT (p)-[:FRIEND]->(fof) AND
p <> fof
RETURN fof.pid
(I added making sure that p<>fof
because friend links might go back to the original person)
To do this in a traverser, you wouldn't need to have two traverser, just one. You'd traverse only FRIEND
relationships, stop at depth 2, and accumulate a set of results.
Now, I'm going to attempt to argue that you should almost always use Cypher, and never use the traversal API unless you have very specific circumstances. Here are my reasons:
OK so when should you use traversal? Two key cases that I know of (others may suggest others)
Core API, Traversal Framework or Cypher?
The Core API allows developers to fine-tune their queries so that they exhibit high affinity with the underlying graph. A well-written Core API query is often faster than any other approach. The downside is that such queries can be verbose, requiring considerable developer effort. Moreover, their high affinity with the underlying graph makes them tightly coupled to its structure. When the graph structure changes, they can often break. Cypher can be more tolerant of structural changes—things such as variable-length paths help mitigate variation and change.
The Traversal Framework is both more loosely coupled than the Core API (because it allows the developer to declare informational goals), and less verbose, and as a result a query written using the Traversal Framework typically requires less developer effort than the equivalent written using the Core API. Because it is a general-purpose framework, however, the Traversal Framework tends to perform marginally less well than a well-written Core API query.
If we find ourselves in the unusual situation of coding with the Core API or Traversal Framework (and thus eschewing Cypher and its affordances), it’s because we are working on an edge case where we need to finely craft an algorithm that cannot be expressed effectively using Cypher’s pattern matching. Choosing between the Core API and the Traversal Framework is a matter of deciding whether the higher abstraction/ lower coupling of the Traversal Framework is sufficient, or whether the close-tothe- metal/higher coupling of the Core API is in fact necessary for implementing an algorithm correctly and in accordance with our performance requirements.
Ref: Graph Databases, New Opportunities for Connected Data, p161
Definition goes in developer doc as follows: cypher is a declarative, SQL-inspired language for describing patterns in graphs visually using an ascii-art syntax.
You can find more about it here.
I found this page having following sentence:
Besides an object-oriented API to the graph database, working with
Node
,Relationship
, andPath
objects, it also offers highly customizable, high-speed traversal- and graph-algorithm implementations.
So practically speaking core API deals with basic objects such as Node
, Relationship
which belongs to org.neo4j.graphdb
package.
You can find more at its developer guide.
Traversal API adds more interfaces to core API to help us conveniently perform traversal, instead of writing the whole traversal logic from scratch. These interfaces are contained in org.neo4j.graphdb.traversal
package.
You can find more at its developer guide.
According to this answer:
The Traversal API is built on the Core API, and Cypher is build on the Traversal API; So anything you can do in Cypher, can be done with the other 2.
This tutorial from 2012 shows all three in action for performing same task, with Core API being fastest. It includes a quote from Andres Taylor:
Cypher is just over a year old. Since we are very constrained on developers, we have had to be very picky about what we work on the focus in this first phase has been to explore the language, and learn about how our users use the query language, and to expand the feature set to a reasonable level.
I believe that Cypher is our future API. I know you can very easily outperform Cypher by handwriting queries. like every language ever created, in the beginning you can always do better than the compiler by writing by hand but eventually, the compiler catches up
Article's conclusion:
So far I was only using the Java Core API working with neo4j and I will continue to do so.
If you are in a high speed scenario (I believe every web application is one) you should really think about switching to the neo4j Java core API for writing your queries. It might not be as nice looking as Cypher or the traverser Framework but the gain in speed pays off.
Also I personally like the amount of control that you have when traversing over the core yourself.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With