Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Only one SparkContext may be running in this JVM - [SPARK]

I'm trying to run the following code to get twitter information live:

import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.twitter._
import org.apache.spark.streaming.StreamingContext._
import twitter4j.auth.Authorization
import twitter4j.Status
import twitter4j.auth.AuthorizationFactory
import twitter4j.conf.ConfigurationBuilder
import org.apache.spark.streaming.api.java.JavaStreamingContext

import org.apache.spark.rdd.RDD
import org.apache.spark.SparkContext
import org.apache.spark.mllib.feature.HashingTF
import org.apache.spark.mllib.linalg.Vector
import org.apache.spark.SparkConf
import org.apache.spark.api.java.JavaSparkContext
import org.apache.spark.api.java.function.Function
import org.apache.spark.streaming.Duration
import org.apache.spark.streaming.api.java.JavaDStream
import org.apache.spark.streaming.api.java.JavaReceiverInputDStream

val consumerKey = "xxx"
val consumerSecret = "xxx"
val accessToken = "xxx"
val accessTokenSecret = "xxx"
val url = "https://stream.twitter.com/1.1/statuses/filter.json"

val sparkConf = new SparkConf().setAppName("Twitter Streaming")
val sc = new SparkContext(sparkConf)

val documents: RDD[Seq[String]] = sc.textFile("").map(_.split(" ").toSeq)


// Twitter Streaming
val ssc = new JavaStreamingContext(sc,Seconds(2))

val conf = new ConfigurationBuilder()
conf.setOAuthAccessToken(accessToken)
conf.setOAuthAccessTokenSecret(accessTokenSecret)
conf.setOAuthConsumerKey(consumerKey)
conf.setOAuthConsumerSecret(consumerSecret)
conf.setStreamBaseURL(url)
conf.setSiteStreamBaseURL(url)

val filter = Array("Twitter", "Hadoop", "Big Data")

val auth = AuthorizationFactory.getInstance(conf.build())
val tweets : JavaReceiverInputDStream[twitter4j.Status] = TwitterUtils.createStream(ssc, auth, filter)

val statuses = tweets.dstream.map(status => status.getText)
statuses.print()
ssc.start()

But when it arrives at this command: val sc = new SparkContext(sparkConf), the following error appears:

17/05/09 09:08:35 WARN SparkContext: Multiple running SparkContexts detected in the same JVM! org.apache.spark.SparkException: Only one SparkContext may be running in this JVM (see SPARK-2243). To ignore this error, set spark.driver.allowMultipleContexts = true.

I have tried to add the following parameters to the sparkConf value, but the error still appears:

val sparkConf = new SparkConf().setAppName("Twitter Streaming").setMaster("local[4]").set("spark.driver.allowMultipleContexts", "true")

If I ignore the error and continue running commands I get this other error:

17/05/09 09:15:44 WARN ReceiverSupervisorImpl: Restarting receiver with delay 2000 ms: Error receiving tweets 401:Authentication credentials (https://dev.twitter.com/pages/auth) were missing or incorrect. Ensure that you have set valid consumer key/secret, access token/secret, and the system clock is in sync. \n\n\nError 401 Unauthorized HTTP ERROR: 401

Problem accessing '/1.1/statuses/filter.json'. Reason:Unauthorized

Any kind of contribution is appreciated. A greeting and have a good day.

like image 697
trick15f Avatar asked May 10 '17 10:05

trick15f


People also ask

What is Spark SparkContext?

A SparkContext represents the connection to a Spark cluster, and can be used to create RDDs, accumulators and broadcast variables on that cluster. Note: Only one SparkContext should be active per JVM. You must stop() the active SparkContext before creating a new one.

Can we create multiple Spark context?

Note: we can have multiple spark contexts by setting spark. driver. allowMultipleContexts to true . But having multiple spark contexts in the same jvm is not encouraged and is not considered as a good practice as it makes it more unstable and crashing of 1 spark context can affect the other.

How do I start SparkContext?

To create a SparkContext you first need to build a SparkConf object that contains information about your application. The appName parameter is a name for your application to show on the cluster UI. master is a Spark, Mesos or YARN cluster URL, or a special “local” string to run in local mode.

Is SparkContext initialized?

SparkContext is an object which allows us to create the base RDDs. Every Spark application must contain this object to interact with Spark. It is also used to initialize StreamingContext , SQLContext and HiveContext .


1 Answers

A Spark-shell already prepares a spark-session or spark-context for you to use - so you don't have to / can't initialize a new one. Usually you will have a line telling you under what variable it is available to you a the end of the spark-shell launch process. allowMultipleContexts exists only for testing some functionalities of Spark, and shouldn't be used in most cases.

like image 71
Rick Moritz Avatar answered Sep 22 '22 09:09

Rick Moritz