Counting Super Nodes On Titan

Question

In my system I have the requirement that the number of edges on the node must be stored as an internal property on the vertex as well as a vertex centric index on a specific outgoing edge. This naturally requires me to count the number of edges on the node after all the data has finished loading. I do so as follows:

long edgeCount = graph.getGraph().traversal().V(vertexId).bothE().count().next();

However when I scale up my tests to the point where some of my nodes are "super" nodes I get the following exception on the above line:

Caused by: com.netflix.astyanax.connectionpool.exceptions.TransportException: TransportException: [host=127.0.0.1(127.0.0.1):9160, latency=4792(4792), attempts=1]org.apache.thrift.transport.TTransportException: Frame size (70936735) larger than max length (62914560)!
    at com.netflix.astyanax.thrift.ThriftConverter.ToConnectionPoolException(ThriftConverter.java:197) ~[astyanax-thrift-3.8.0.jar!/:3.8.0]
    at com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:65) ~[astyanax-thrift-3.8.0.jar!/:3.8.0]
    at com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:28) ~[astyanax-thrift-3.8.0.jar!/:3.8.0]
    at com.netflix.astyanax.thrift.ThriftSyncConnectionFactoryImpl$ThriftConnection.execute(ThriftSyncConnectionFactoryImpl.java:153) ~[astyanax-thrift-3.8.0.jar!/:3.8.0]
    at com.netflix.astyanax.connectionpool.impl.AbstractExecuteWithFailoverImpl.tryOperation(AbstractExecuteWithFailoverImpl.java:119) ~[astyanax-core-3.8.0.jar!/:3.8.0]
    at com.netflix.astyanax.connectionpool.impl.AbstractHostPartitionConnectionPool.executeWithFailover(AbstractHostPartitionConnectionPool.java:352) ~[astyanax-core-3.8.0.jar!/:3.8.0]
    at com.netflix.astyanax.thrift.ThriftColumnFamilyQueryImpl$4.execute(ThriftColumnFamilyQueryImpl.java:538) ~[astyanax-thrift-3.8.0.jar!/:3.8.0]
    at com.thinkaurelius.titan.diskstorage.cassandra.astyanax.AstyanaxKeyColumnValueStore.getNamesSlice(AstyanaxKeyColumnValueStore.java:112) ~[titan-cassandra-1.0.0.jar!/:na]

What is the best way to fix this ? Should I simply increase the frame size or is there a better way to count the number of edges on the node ?

Jason Plurad · Accepted Answer

Yes, you will need to increase the frame size. When you have a supernode, there is a really big row that needs to be read out of the storage backend, and this is even true in the OLAP case. I agree that if you are planning to calculate this on every vertex in the graph, this would be best done as an OLAP operation.

This and several other good tips can be found in this Titan mailing list thread. Keep in mind that link is pretty old, so the concepts are still valid, but some of the Titan configuration properties names may be different.

imriqwe · Answer

Such a task, which is OLAP by its nature, should be performed using a distributed system, not using a traversal.

There is a concept called GraphComputer in TinkerPop 3, which can be used to perform such a task.

It is basically allows you to run Gremlin queries, which will be evaluated on multiple machines.

For example, you can use SparkGraphComputer to run your queries on top of Apache Spark.

Counting Super Nodes On Titan

Tags:

titan

tinkerpop

Filipe Teixeira

2 Answers

Jason Plurad

imriqwe

Recent Activity

Donate For Us

Counting Super Nodes On Titan

Tags:

titan

tinkerpop

Filipe Teixeira

2 Answers

Jason Plurad

imriqwe

Related questions

Recent Activity

Donate For Us