Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Triggering spark jobs with REST

I have been of late trying out apache spark. My question is more specific to trigger spark jobs. Here I had posted question on understanding spark jobs. After getting dirty on jobs I moved on to my requirement.

I have a REST end point where I expose API to trigger Jobs, I have used Spring4.0 for Rest Implementation. Now going ahead I thought of implementing Jobs as Service in Spring where I would submit Job programmatically, meaning when the endpoint is triggered, with given parameters I would trigger the job. I have now few design options.

  • Similar to the below written job, I need to maintain several Jobs called by a Abstract Class may be JobScheduler .

     /*Can this Code be abstracted from the application and written as    as a seperate job. Because my understanding is that the   Application code itself has to have the addJars embedded   which internally  sparkContext takes care.*/   SparkConf sparkConf = new SparkConf().setAppName("MyApp").setJars(  new String[] { "/path/to/jar/submit/cluster" })  .setMaster("/url/of/master/node");   sparkConf.setSparkHome("/path/to/spark/");          sparkConf.set("spark.scheduler.mode", "FAIR");         JavaSparkContext sc = new JavaSparkContext(sparkConf);         sc.setLocalProperty("spark.scheduler.pool", "test");      // Application with Algorithm , transformations 
  • extending above point have multiple versions of jobs handled by service.

  • Or else use an Spark Job Server to do this.

Firstly, I would like to know what is the best solution in this case, execution wise and also scaling wise.

Note : I am using a standalone cluster from spark. kindly help.

like image 867
chaosguru Avatar asked Mar 11 '15 16:03

chaosguru


People also ask

How do I submit a Spark job on REST API?

You can submit any Spark application that runs Spark SQL or data transformation, data science and machine learning jobs by using the Spark jobs REST API. Each submitted job runs in a dedicated cluster. Any configuration settings that you pass through the jobs API, will override the default configurations.

Does Spark support REST API?

You can run a Spark REST API job by using the Spark Standalone mode. One of the REST API rules states that you will get a piece of data (resource) when linked to a URL (endpoint). You can send requests by using the command line utility called cURL.

What happens when a Spark job is triggered?

When show is triggered on dataset, it gets converted to head(20) action which in turn get converted to limit(20) action . Spark executes limit in an incremental fashion until the limit query is satisfied.


1 Answers

It turns out Spark has a hidden REST API to submit a job, check status and kill.

Check out full example here: http://arturmkrtchyan.com/apache-spark-hidden-rest-api

like image 63
Artur Mkrtchyan Avatar answered Sep 17 '22 20:09

Artur Mkrtchyan