Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Neo4j Traversal API vs. Cypher

Tags:

neo4j

cypher

When should I choose Neo4j’s traversal framework over Cypher?

For example, for a friend-of-a-friend query I would write a Cypher query as follows:

MATCH (p:Person {pid:'56'})-[:FRIEND*2..2]->(fof) 
WHERE NOT (p)-[:FRIEND]->(fof) 
RETURN fof.pid

And the corresponding Traversal implementation would require two traversals for friends_at_depth_1 and friends_at_depth_2 (or a core API call to get the relationships) and find the difference of these two sets using plain java constructs, outside of the traversal description. Correct me if I’m wrong here.

Any thoughts?

like image 467
Jay Avatar asked Feb 22 '15 11:02

Jay


People also ask

What is traversal Neo4j?

The Neo4j Traversal framework Java API is a callback-based, lazily-executed way of specifying desired movements through a graph in Java. Some traversal examples are collected under Traversal. You can also use the Cypher query language as a powerful declarative way to query the graph.

What is the purpose of the Neo4j Cypher tool?

Cypher is Neo4j's graph query language that lets you retrieve data from the graph. It is like SQL for graphs, and was inspired by SQL so it lets you focus on what data you want out of the graph (not how to go get it).

What are the limitations of Neo4j?

Neo4j has some upper bound limit for the graph size and can support tens of billions of nodes, properties, and relationships in a single graph. No security is provided at the data level and there is no data encryption. Security auditing is not available in Neo4j.

What is depth in Neo4j?

100 depth originates in the nomenclature of the neography ruby gem, where this is done using the abstract depth method. In neo4j, this is called variable length relationships, as can be seen here in the documentation: MATCH / Variable length relationships.


2 Answers

The key thing to remember about Cypher vs. the traversal API is that the traversal API is an imperative way of accessing a graph, and Cypher is a declarative way of accessing a graph. You can read more about that difference here but the short version is that in imperative access, you're telling the database exactly how to go get the graph. (E.g. I want to do a depth first search, prune these branches, stop when I hit certain nodes, etc). In declarative graph query, you're instead specifying what you want, and you're delegating all aspects of how to get it to the Cypher implementation.

In your query, I'd slightly revise it:

MATCH (p:Person {pid:'56'})-[:FRIEND*2..2]->(fof) 
WHERE NOT (p)-[:FRIEND]->(fof) AND
      p <> fof
RETURN fof.pid

(I added making sure that p<>fof because friend links might go back to the original person)

To do this in a traverser, you wouldn't need to have two traverser, just one. You'd traverse only FRIEND relationships, stop at depth 2, and accumulate a set of results.

Now, I'm going to attempt to argue that you should almost always use Cypher, and never use the traversal API unless you have very specific circumstances. Here are my reasons:

  1. Declarative query is very powerful, in that it frees you from thinking about the how. All you need to know is what you want. This means you spend more time focusing on what your code is supposed to do, and less time in implementation detail.
  2. The cypher query executor is getting better all the time (version 2.2 will have a cost based planner) and of course they put a lot of effort into making sure cypher exploits all available indexes. I'ts possible that for many queries, cypher would do a better job of finding your data than your traversal, unless you were very careful in coding the traversal.
  3. Cypher is just way less code than writing your own traversal, which will frequently require you to implement certain classes to do specialized stop conditions, etc.
  4. At present, cypher can run in embedded databases, or on the server. If you want to run a traversal, you can't send that remotely to a server to be executed; maybe at best you could write a server extension that did the traversal. So I think cypher is more flexible at present.

OK so when should you use traversal? Two key cases that I know of (others may suggest others)

  1. Sometimes you need to execute a complex custom java code operation on everything you traverse. In this case, you're using the traverser as a "visitor function" of sorts, and sometimes traversals are more convenient to use than cypher, depending on the nature of the java you're running on the nodes.
  2. Sometimes your performance requirements are so intense, you need to hand-traverse the graph, because there's some aspect of graph structure that you can exploit in the traverser to make it go faster that Cypher can't take advantage of. This does happen, but going to this first usually isn't a good idea.
like image 179
FrobberOfBits Avatar answered Oct 17 '22 13:10

FrobberOfBits


An excerpt from the book

Core API, Traversal Framework or Cypher?

The Core API allows developers to fine-tune their queries so that they exhibit high affinity with the underlying graph. A well-written Core API query is often faster than any other approach. The downside is that such queries can be verbose, requiring considerable developer effort. Moreover, their high affinity with the underlying graph makes them tightly coupled to its structure. When the graph structure changes, they can often break. Cypher can be more tolerant of structural changes—things such as variable-length paths help mitigate variation and change.

The Traversal Framework is both more loosely coupled than the Core API (because it allows the developer to declare informational goals), and less verbose, and as a result a query written using the Traversal Framework typically requires less developer effort than the equivalent written using the Core API. Because it is a general-purpose framework, however, the Traversal Framework tends to perform marginally less well than a well-written Core API query.

If we find ourselves in the unusual situation of coding with the Core API or Traversal Framework (and thus eschewing Cypher and its affordances), it’s because we are working on an edge case where we need to finely craft an algorithm that cannot be expressed effectively using Cypher’s pattern matching. Choosing between the Core API and the Traversal Framework is a matter of deciding whether the higher abstraction/ lower coupling of the Traversal Framework is sufficient, or whether the close-tothe- metal/higher coupling of the Core API is in fact necessary for implementing an algorithm correctly and in accordance with our performance requirements.

Ref: Graph Databases, New Opportunities for Connected Data, p161

What is cypher?

Definition goes in developer doc as follows: cypher is a declarative, SQL-inspired language for describing patterns in graphs visually using an ascii-art syntax.

You can find more about it here.

What is core API practically?

I found this page having following sentence:

Besides an object-oriented API to the graph database, working with Node, Relationship, and Path objects, it also offers highly customizable, high-speed traversal- and graph-algorithm implementations.

So practically speaking core API deals with basic objects such as Node, Relationship which belongs to org.neo4j.graphdb package.

You can find more at its developer guide.

What is traversal API practically?

Traversal API adds more interfaces to core API to help us conveniently perform traversal, instead of writing the whole traversal logic from scratch. These interfaces are contained in org.neo4j.graphdb.traversal package.

You can find more at its developer guide.

The relation between all three

According to this answer:

The Traversal API is built on the Core API, and Cypher is build on the Traversal API; So anything you can do in Cypher, can be done with the other 2.

Same example done with all three

This tutorial from 2012 shows all three in action for performing same task, with Core API being fastest. It includes a quote from Andres Taylor:

Cypher is just over a year old. Since we are very constrained on developers, we have had to be very picky about what we work on the focus in this first phase has been to explore the language, and learn about how our users use the query language, and to expand the feature set to a reasonable level.
I believe that Cypher is our future API. I know you can very easily outperform Cypher by handwriting queries. like every language ever created, in the beginning you can always do better than the compiler by writing by hand but eventually, the compiler catches up

Article's conclusion:

So far I was only using the Java Core API working with neo4j and I will continue to do so.
If you are in a high speed scenario (I believe every web application is one) you should really think about switching to the neo4j Java core API for writing your queries. It might not be as nice looking as Cypher or the traverser Framework but the gain in speed pays off.
Also I personally like the amount of control that you have when traversing over the core yourself.

like image 45
Mahesha999 Avatar answered Oct 17 '22 15:10

Mahesha999