Setup and configuration of JanusGraph for a Spark cluster and Cassandra

Tags:

I am running JanusGraph (0.1.0) with Spark (1.6.1) on a single machine. I did my configuration as described here. When accessing the graph on the gremlin-console with the SparkGraphComputer, it is always empty. I cannot find any error in the logfiles, it is just an empty graph.

Is anyone using JanusGraph with Spark and can share his configuration and properties?

Using a JanusGraph, I get the expected Output:

gremlin> graph=JanusGraphFactory.open('conf/test.properties')
==>standardjanusgraph[cassandrathrift:[127.0.0.1]]
gremlin> g=graph.traversal()
==>graphtraversalsource[standardjanusgraph[cassandrathrift:[127.0.0.1]], standard]
gremlin> g.V().count()
14:26:10 WARN  org.janusgraph.graphdb.transaction.StandardJanusGraphTx  - Query requires iterating over all vertices [()]. For better performance, use indexes
==>1000001
gremlin>

Using a HadoopGraph with Spark as GraphComputer, the graph is empty:

gremlin> graph=GraphFactory.open('conf/test.properties')
==>hadoopgraph[cassandrainputformat->gryooutputformat]
gremlin> g=graph.traversal().withComputer(SparkGraphComputer)
==>graphtraversalsource[hadoopgraph[cassandrainputformat->gryooutputformat], sparkgraphcomputer]
gremlin> g.V().count()
            ==>0==============================================>   (14 + 1) / 15]

My conf/test.properties:

#
# Hadoop Graph Configuration
#
gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphInputFormat=org.janusgraph.hadoop.formats.cassandra.CassandraInputFormat
gremlin.hadoop.graphOutputFormat=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat
gremlin.hadoop.memoryOutputFormat=org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat
gremlin.hadoop.memoryOutputFormat=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat

gremlin.hadoop.deriveMemory=false
gremlin.hadoop.jarsInDistributedCache=true
gremlin.hadoop.inputLocation=none
gremlin.hadoop.outputLocation=output

#
# Titan Cassandra InputFormat configuration
#
janusgraphmr.ioformat.conf.storage.backend=cassandrathrift
janusgraphmr.ioformat.conf.storage.hostname=127.0.0.1
janusgraphmr.ioformat.conf.storage.keyspace=janusgraph
storage.backend=cassandrathrift
storage.hostname=127.0.0.1
storage.keyspace=janusgraph

#
# Apache Cassandra InputFormat configuration
#
cassandra.input.partitioner.class=org.apache.cassandra.dht.Murmur3Partitioner
cassandra.input.keyspace=janusgraph
cassandra.input.predicate=0c00020b0001000000000b000200000000020003000800047fffffff0000
cassandra.input.columnfamily=edgestore
cassandra.range.batch.size=2147483647

#
# SparkGraphComputer Configuration
#
spark.master=spark://127.0.0.1:7077
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.executor.memory=100g

gremlin.spark.persistContext=true
gremlin.hadoop.defaultGraphComputer=org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer

HDFS seems to be configured correctly as described here:

gremlin> hdfs
==>storage[DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_178390072_1, ugi=cassandra (auth:SIMPLE)]]]

560

asked May 05 '17 12:05

Felix Hill

Video Answer

1 Answers

Try fixing these properties:

janusgraphmr.ioformat.conf.storage.keyspace=janusgraph
storage.keyspace=janusgraph

Replace with:

janusgraphmr.ioformat.conf.storage.cassandra.keyspace=janusgraph
storage.cassandra.keyspace=janusgraph

The default keyspace name is janusgraph, so despite the mistakes on the property names, I don't think you would have observed that problem unless you loaded your data using a different keyspace name.

The latter property is described in the Configuration Reference. Also, keep an eye on this open issue to improve the docs for Hadoop-Graph usage.

183

answered Oct 16 '22 12:10

Jason Plurad

Related questions
                            
                                Hadoop YARN - how to limit requestedMemory?
                            
                                Hbase managed zookeeper suddenly trying to connect to localhost instead of zookeeper quorum
                            
                                How to convert .txt / .csv file to ORC format
                            
                                mrjob: setup logging on EMR
                            
                                getmerge command in hadoop datacopy
                            
                                Difference between hive thrift server from hive and spark distributions
                            
                                Hadoop's HDFS with Spark
                            
                                Spark - failed on connection exception: java.net.ConnectException - localhost
                            
                                How does Hadoop get input data not stored on HDFS?
                            
                                Getting an error on running HCatalog
                            
                                Can I change Spark's executor memory at runtime?
                            
                                NoSuchMethodError writing Avro object to HDFS using Builder
                            
                                Unable to connect with azure blob storage with local hadoop
                            
                                Hive : casting array<string> to array<int> in query
                            
                                Can we get all the column names from an HBase table?
                            
                                Towards limiting the big RDD
                            
                                How can I know spark-core version?
                            
                                Unable to load data in Hive partitioned table
                            
                                How to convert timestamp (with dot between second and millisecond) to date(yyyyMMdd) in Hive?
                            
                                Impala/Hive to get list of tables along with its size

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Setup and configuration of JanusGraph for a Spark cluster and Cassandra

Tags:

cassandra

apache-spark

hadoop

titan

janusgraph

Felix Hill

People also ask

Video Answer

1 Answers

Jason Plurad

Recent Activity

Donate For Us