Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark streaming StreamingContext active count

The spark docs state:

Only one StreamingContext can be active in a JVM at the same time.

Imagine a case where I am planning to read/process data from two Kafka topics, where there is one job fetching data from one Kafka topic, and another fetching data from the other kafka topic. Can I trigger these two jobs in parallel on the same hadoop cluster simultaneously?

It also states,

Once a context has been stopped, it cannot be restarted.

So if I have to stop the spark job due to some reason, what is the way to get it restarted? Do I trigger it through oozie or something?

like image 406
sc so Avatar asked Feb 16 '26 20:02

sc so


1 Answers

Can I trigger these two jobs in parallel on the same hadoop cluster simultaneously?

For the sake of simplicity, lets clear up the terms. A StreamingContext is unique within a Spark job. If you want to read multiple streams inside the same job, you can do that by passing the same StreamingContext twice to different KafkaUtils.createStream methods.

If you have multiple jobs, which you submit to Spark, then each can have it's own StreamingContext. Each job will have it's own JVM instance.

So if I have to stop the spark job due to some reason, what is the way to get it restarted?

One possible way of achieving what you want is using Spark's cluster mode to run your streaming job and passing the --supervise flag. The Spark Master will make sure the job is restarted on failure.

You can read more on that in Sparks "Submitting Applications" documentation.

like image 134
Yuval Itzchakov Avatar answered Feb 18 '26 19:02

Yuval Itzchakov



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!