How does Titan achieve constant time lookup using HBase / Cassandra?

Question

In the O'Reilly book "Graph Databases" in chapter 6, which is about how Neo4j stores a graph database it says:

To understand why native graph processing is so much more efficient than graphs based on heavy indexing, consider the following. Depending on the implementation, index lookups could be O(log n) in algorithmic complexity versus O(1) for looking up immediate relationships. To traverse a network of m steps, the cost of the indexed approach, at O(m log n), dwarfs the cost of O(m) for an implementation that uses index-free adjacency.

It is then explained that Neo4j achieves this constant time lookup by storing all nodes and relationships as fixed size records:

With fixed sized records and pointer-like record IDs, traversals are implemented simply by chasing pointers around a data structure, which can be performed at very high speed. To traverse a particular relationship from one node to another, the database performs several cheap ID computations (these computations are much cheaper than searching global indexes, as we’d have to do if faking a graph in a non-graph native database)

This last sentence triggers my question: how does Titan, which uses Cassandra or HBase as a storage backend, achieve these performance gains or make up for it?

Marko A. Rodriguez · Accepted Answer

Neo4j only achieves O(1) when the data is in-memory in the same JVM. When the data is on disk, Neo4j is slow because of pointer chasing on disk (they have a poor disk representation).

Titan only achieves O(1) when the data is in-memory in the same JVM. When the data is on disk, Titan is faster than Neo4j cause it has a better disk representation.

Please see the following blog post that explains the above quantitatively: http://thinkaurelius.com/2013/11/24/boutique-graph-data-with-titan/

Thus, its important to understand when people say O(1) what part of the memory hierarchy they are in. When you are in a single JVM (single machine), its easy to be fast as both Neo4j and Titan demonstrate with their respective caching engines. When you can't put the entire graph in memory, you have to rely on intelligent disk layouts, distributed caches, and the like.

Please see the following two blog posts for more information:

http://thinkaurelius.com/2013/11/01/a-letter-regarding-native-graph-databases/ http://thinkaurelius.com/2013/07/22/scalable-graph-computing-der-gekrummte-graph/

How does Titan achieve constant time lookup using HBase / Cassandra?

Tags:

graph-databases

neo4j

titan

Lodewijk Bogaards

1 Answers

Marko A. Rodriguez

Recent Activity

Donate For Us

How does Titan achieve constant time lookup using HBase / Cassandra?

Tags:

graph-databases

neo4j

titan

Lodewijk Bogaards

1 Answers

Marko A. Rodriguez

Related questions

Recent Activity

Donate For Us