Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Counting Super Nodes On Titan

In my system I have the requirement that the number of edges on the node must be stored as an internal property on the vertex as well as a vertex centric index on a specific outgoing edge. This naturally requires me to count the number of edges on the node after all the data has finished loading. I do so as follows:

long edgeCount = graph.getGraph().traversal().V(vertexId).bothE().count().next();

However when I scale up my tests to the point where some of my nodes are "super" nodes I get the following exception on the above line:

Caused by: com.netflix.astyanax.connectionpool.exceptions.TransportException: TransportException: [host=127.0.0.1(127.0.0.1):9160, latency=4792(4792), attempts=1]org.apache.thrift.transport.TTransportException: Frame size (70936735) larger than max length (62914560)!
    at com.netflix.astyanax.thrift.ThriftConverter.ToConnectionPoolException(ThriftConverter.java:197) ~[astyanax-thrift-3.8.0.jar!/:3.8.0]
    at com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:65) ~[astyanax-thrift-3.8.0.jar!/:3.8.0]
    at com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:28) ~[astyanax-thrift-3.8.0.jar!/:3.8.0]
    at com.netflix.astyanax.thrift.ThriftSyncConnectionFactoryImpl$ThriftConnection.execute(ThriftSyncConnectionFactoryImpl.java:153) ~[astyanax-thrift-3.8.0.jar!/:3.8.0]
    at com.netflix.astyanax.connectionpool.impl.AbstractExecuteWithFailoverImpl.tryOperation(AbstractExecuteWithFailoverImpl.java:119) ~[astyanax-core-3.8.0.jar!/:3.8.0]
    at com.netflix.astyanax.connectionpool.impl.AbstractHostPartitionConnectionPool.executeWithFailover(AbstractHostPartitionConnectionPool.java:352) ~[astyanax-core-3.8.0.jar!/:3.8.0]
    at com.netflix.astyanax.thrift.ThriftColumnFamilyQueryImpl$4.execute(ThriftColumnFamilyQueryImpl.java:538) ~[astyanax-thrift-3.8.0.jar!/:3.8.0]
    at com.thinkaurelius.titan.diskstorage.cassandra.astyanax.AstyanaxKeyColumnValueStore.getNamesSlice(AstyanaxKeyColumnValueStore.java:112) ~[titan-cassandra-1.0.0.jar!/:na]

What is the best way to fix this ? Should I simply increase the frame size or is there a better way to count the number of edges on the node ?

like image 543
Filipe Teixeira Avatar asked Mar 24 '16 08:03

Filipe Teixeira


2 Answers

Yes, you will need to increase the frame size. When you have a supernode, there is a really big row that needs to be read out of the storage backend, and this is even true in the OLAP case. I agree that if you are planning to calculate this on every vertex in the graph, this would be best done as an OLAP operation.

This and several other good tips can be found in this Titan mailing list thread. Keep in mind that link is pretty old, so the concepts are still valid, but some of the Titan configuration properties names may be different.

like image 161
Jason Plurad Avatar answered Sep 18 '22 00:09

Jason Plurad


Such a task, which is OLAP by its nature, should be performed using a distributed system, not using a traversal.

There is a concept called GraphComputer in TinkerPop 3, which can be used to perform such a task.

It is basically allows you to run Gremlin queries, which will be evaluated on multiple machines.

For example, you can use SparkGraphComputer to run your queries on top of Apache Spark.

like image 40
imriqwe Avatar answered Sep 22 '22 00:09

imriqwe