I have already done with spark installation and executed few testcases setting master and worker nodes. That said, I have a very fat confusion of what exactly a job is meant in Spark context(not SparkContext). I have below questions
I read the Spark documention but still this thing is not clear for me.
Having said, my implementation is to write spark jobs{programmatically} which would to a spark-submit.
Kindly help with some example if possible . It would be very helpdful.
Note: Kindly do not post spark links because I have already tried it. Even though the questions sounds naive but still I need more clarity in understanding.
Spark is a general-purpose distributed data processing engine that is suitable for use in a wide range of circumstances. On top of the Spark core data processing engine, there are libraries for SQL, machine learning, graph computation, and stream processing, which can be used together in an application.
In Spark, when spark-submit get called user code is divided in small parts called jobs, stages and tasks. Job- A Job is a sequence of Stages, triggered by an Action such as . count(), foreachRdd(), collect(), read() or write().
The career outlook for the spark developer is good. As it has been observed that demand for growth in a job at entry level position like software developer has been increased rapidly across the global organizations.
Well, terminology can always be difficult since it depends on context. In many cases, you can be used to "submit a job to a cluster", which for spark would be to submit a driver program.
That said, Spark has his own definition for "job", directly from the glossary:
Job A parallel computation consisting of multiple tasks that gets spawned in response to a Spark action (e.g. save, collect); you'll see this term used in the driver's logs.
So I this context, let's say you need to do the following:
So,
Hope it makes things clearer ;-)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With