I am trying to set up a Sparkstreaming code which reads line from the Kafka server but processes it using rules written in another local file. I am creating streamingContext for the streaming data and sparkContext for other applying all other spark features - like string manipulation, reading local files etc
val sparkConf = new SparkConf().setMaster("local[*]").setAppName("ReadLine")
val ssc = new StreamingContext(sparkConf, Seconds(15))
ssc.checkpoint("checkpoint")
val topicMap = topics.split(",").map((_, numThreads.toInt)).toMap
val lines = KafkaUtils.createStream(ssc, zkQuorum, group, topicMap).map(_._2)
val sentence = lines.toString
val conf = new SparkConf().setAppName("Bi Gram").setMaster("local[2]")
val sc = new SparkContext(conf)
val stringRDD = sc.parallelize(Array(sentence))
But this throws the following error
Exception in thread "main" org.apache.spark.SparkException: Only one SparkContext may be running in this JVM (see SPARK-2243). To ignore this error, set spark.driver.allowMultipleContexts = true. The currently running SparkContext was created at:
org.apache.spark.SparkContext.<init>(SparkContext.scala:82)
org.apache.spark.streaming.StreamingContext$.createNewSparkContext(StreamingContext.scala:874)
org.apache.spark.streaming.StreamingContext.<init>(StreamingContext.scala:81)
Only one StreamingContext can be active in a JVM at the same time. stop() on StreamingContext also stops the SparkContext. To stop only the StreamingContext, set the optional parameter of stop() called stopSparkContext to false.
Sparkcontext is the entry point for spark environment. For every sparkapp you need to create the sparkcontext object. In spark 2 you can use sparksession instead of sparkcontext. Sparkconf is the class which gives you the various option to provide configuration parameters.
Spark Streaming is an extension of the core Spark API that allows data engineers and data scientists to process real-time data from various sources including (but not limited to) Kafka, Flume, and Amazon Kinesis. This processed data can be pushed out to file systems, databases, and live dashboards.
One application can only have ONE SparkContext
. StreamingContext
is created on SparkContext
. Just need to create ssc StreamingContext using SparkContext
val sc = new SparkContext(conf)
val ssc = new StreamingContext(sc, Seconds(15))
If using the following constructor.
StreamingContext(conf: SparkConf, batchDuration: Duration)
It internally create another SparkContext
this(StreamingContext.createNewSparkContext(conf), null, batchDuration)
the SparkContext
can get from StreamingContext
by
ssc.sparkContext
yes you can do it you have to first start spark session and
then use its context to start any number of streaming context
val spark = SparkSession.builder().appName("someappname").
config("spark.sql.warehouse.dir",warehouseLocation).getOrCreate()
val ssc = new StreamingContext(spark.sparkContext, Seconds(1))
Simple!!!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With