I open the spark shell
spark-shell --packages org.apache.spark:spark-streaming-kafka_2.10:1.6.0
Then I want to create a streaming context
import org.apache.spark._
import org.apache.spark.streaming._
val conf = new SparkConf().setMaster("local[2]").setAppName("NetworkWordCount").set("spark.driver.allowMultipleContexts", "true")
val ssc = new StreamingContext(conf, Seconds(1))
I run into a exception:
org.apache.spark.SparkException: Only one SparkContext may be running in this JVM (see SPARK-2243). To ignore this error, set spark.driver.allowMultipleContexts = true. The currently running SparkContext was created at:
flatMap is a DStream operation that creates a new DStream by generating multiple new records from each record in the source DStream. In this case, each line will be split into multiple words and the stream of words is represented as the words DStream.
spark.streaming.blockInterval : Interval at which data received by Spark Streaming receivers is chunked into blocks of data before storing them in Spark. This is when using receiver bases approach - Receiver-based Approach.
Spark Streaming is an extension of the core Spark API that allows data engineers and data scientists to process real-time data from various sources including (but not limited to) Kafka, Flume, and Amazon Kinesis. This processed data can be pushed out to file systems, databases, and live dashboards.
Spark streaming takes live data streams as input and provides as output batches by dividing them. These streams are then processed by the Spark engine and the final stream results in batches.
When you open the spark-shell, there is already a streaming context created. It is called sc, meaning you do not need to create a configure object. Simply use the existing sc object.
val ssc = new StreamingContext(sc,Seconds(1))
instead of var we will use val
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With