How to load Spark Cassandra Connector in the shell?

Tags:

I am trying to use Spark Cassandra Connector in Spark 1.1.0.

I have successfully built the jar file from the master branch on GitHub and have gotten the included demos to work. However, when I try to load the jar files into the spark-shell I can't import any of the classes from the com.datastax.spark.connector package.

I have tried using the --jars option on spark-shell and adding the directory with the jar file to Java's CLASSPATH. Neither of these options work. In fact, when I use the --jars option, the logging output shows that the Datastax jar is getting loaded, but I still cannot import anything from com.datastax.

I have been able to load the Tuplejump Calliope Cassandra connector into the spark-shell using --jars, so I know that's working. It's just the Datastax connector which is failing for me.

693

asked Sep 14 '14 19:09

egerhard

Video Answer

2 Answers

I got it. Below is what I did:

$ git clone https://github.com/datastax/spark-cassandra-connector.git $ cd spark-cassandra-connector $ sbt/sbt assembly $ $SPARK_HOME/bin/spark-shell --jars ~/spark-cassandra-connector/spark-cassandra-connector/target/scala-2.10/connector-assembly-1.2.0-SNAPSHOT.jar

In scala prompt,

scala> sc.stop scala> import com.datastax.spark.connector._ scala> import org.apache.spark.SparkContext scala> import org.apache.spark.SparkContext._ scala> import org.apache.spark.SparkConf scala> val conf = new SparkConf(true).set("spark.cassandra.connection.host", "my cassandra host") scala> val sc = new SparkContext("spark://spark host:7077", "test", conf)

131

answered Sep 30 '22 21:09

Lishu

Edit: Things are a bit easier now

For in-depth instructions check out the project website https://github.com/datastax/spark-cassandra-connector/blob/master/doc/13_spark_shell.md

Or feel free to use Spark-Packages to load the Library (Not all versions published) http://spark-packages.org/package/datastax/spark-cassandra-connector

> $SPARK_HOME/bin/spark-shell --packages com.datastax.spark:spark-cassandra-connector_2.10:1.4.0-M3-s_2.10

The following assumes you are running with OSS Apache C*

You'll want to start the class with the –driver-class-path set to include all your connector libs

I'll quote a blog post from the illustrious Amy Tobey

The easiest way I’ve found is to set the classpath with then restart the context in the REPL with the necessary classes imported to make sc.cassandraTable() visible. The newly loaded methods will not show up in tab completion. I don’t know why.

  /opt/spark/bin/spark-shell --driver-class-path $(echo /path/to/connector/*.jar |sed 's/ /:/g')

It will print a bunch of log information then present scala> prompt.

scala> sc.stop

Now that the context is stopped, it’s time to import the connector.

scala> import com.datastax.spark.connector._ scala> val conf = new SparkConf() scala> conf.set("cassandra.connection.host", "node1.pc.datastax.com") scala> val sc = new SparkContext("local[2]", "Cassandra Connector Test", conf) scala> val table = sc.cassandraTable("keyspace", "table") scala> table.count

If you are running with DSE < 4.5.1

There is a slight issue with the DSE Classloader and previous package naming conventions that will prevent you from finding the new spark-connector libraries. You should be able to get around this by removing the line specifying the DSE Class loader in the scripts starting spark-shell.

answered Sep 30 '22 21:09

RussS

Related questions
                            
                                Prepared Statement with collection in IN clause in Datastax Cassandra CQL driver
                            
                                Which part of the CAP theorem does Cassandra sacrifice and why?
                            
                                How Can I Search for Records That Have A Null/Empty Field Using CQL?
                            
                                Select first N rows of Cassandra table
                            
                                How to select data from a table and insert into another table?
                            
                                Cassandra Java driver: how many contact points is reasonable?
                            
                                Get current date in cassandra cql select
                            
                                Clustering Keys in Cassandra
                            
                                com.datastax.driver.core.exceptions.InvalidQueryException: unconfigured table schema_keyspaces
                            
                                cqlsh connection error: 'ref() does not take keyword arguments'
                            
                                Apache Cassandra remote access
                            
                                Apache Cassandra vs Datastax Cassandra [closed]
                            
                                What is the batch limit in Cassandra?
                            
                                Cassandra: can I have default value for a column like sql
                            
                                Spatial data with mongodb or cassandra
                            
                                How to rename table in Cassandra CQL3
                            
                                Error while connecting to Cassandra using Java Driver for Apache Cassandra 1.0 from com.example.cassandra
                            
                                difference between exactly-once and at-least-once guarantees
                            
                                problem on starting cassandra
                            
                                Why are super columns in Cassandra no longer favoured?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to load Spark Cassandra Connector in the shell?

Tags:

cassandra

apache-spark

datastax-enterprise