I'm trying to use the twitterUtils in the Spark Shell (where they are not available by default).
I've added the following to spark-env.sh
:
SPARK_CLASSPATH="/disk.b/spark-master-2014-07-28/external/twitter/target/spark-streaming-twitter_2.10-1.1.0-SNAPSHOT.jar"
I can now execute
import org.apache.spark.streaming.twitter._
import org.apache.spark.streaming.StreamingContext._
without an error in the shell, which would not be possible without added the jar to the classpath ("error: object twitter is not a member of package org.apache.spark.streaming"). However, I will get an error when executing this in the Spark shell:
scala> val ssc = new StreamingContext(sc, Seconds(1))
ssc: org.apache.spark.streaming.StreamingContext =
org.apache.spark.streaming.StreamingContext@6e78177b
scala> val tweets = TwitterUtils.createStream(ssc, "twitter.txt")
error: bad symbolic reference. A signature in TwitterUtils.class refers to
term twitter4j in package <root> which is not available.
It may be completely missing from the current classpath, or the version on the classpath might be incompatible with the version used when compiling
TwitterUtils.class.
What am I missing? Do I have to import another jar?
Yep, you need the Twitter4J JARs in addition to the spark-streaming-twitter
one you already have. Specifically, the Spark devs suggest using Twitter4J version 3.0.3.
After you download the correct JARs, you'll want to pass them to the shell via the --jars
flag. I think you can also do this via SPARK_CLASSPATH
as you've done.
Here's how I did it on a Spark EC2 cluster:
#!/bin/bash
cd /root/spark/lib
mkdir twitter4j
# Get the Spark Streaming JAR.
curl -O "http://search.maven.org/remotecontent?filepath=org/apache/spark/spark-streaming-twitter_2.10/1.0.0/spark-streaming-twitter_2.10-1.0.0.jar"
# Get the Twitter4J JARs. Check out http://twitter4j.org/archive/ for other versions.
TWITTER4J_SOURCE=twitter4j-3.0.3.zip
curl -O "http://twitter4j.org/archive/$TWITTER4J_SOURCE"
unzip -j ./$TWITTER4J_SOURCE "lib/*.jar" -d twitter4j/
rm $TWITTER4J_SOURCE
cd
# Point the shell to these JARs and go!
TWITTER4J_JARS=`ls -m /root/spark/lib/twitter4j/*.jar | tr -d '\n'`
/root/spark/bin/spark-shell --jars /root/spark/lib/spark-streaming-twitter_2.10-1.0.0.jar,$TWITTER4J_JARS
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With