Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Passing multiple system properties to google dataproc cluster job

I am trying to submit a spark job on Dataproc cluster. The job needs multiple system properties. I am able to pass just one as follows:

gcloud dataproc jobs submit spark \                                   
    --cluster <cluster_name> \
    --class <class_name> \
    --properties spark.driver.extraJavaOptions=-Dhost=127.0.0.1  \
    --jars spark_job.jar

How do I pass multiple properties? I tried as follow, even this didn't work.

--properties ^#^spark.driver.extraJavaOptions=-Dhost=127.0.0.1,-Dlimit=10

like image 423
Sagar Rakshe Avatar asked Mar 03 '18 12:03

Sagar Rakshe


People also ask

What types of jobs can be run on Google Dataproc?

What type of jobs can I run? Dataproc provides out-of-the box and end-to-end support for many of the most popular job types, including Spark, Spark SQL, PySpark, MapReduce, Hive, and Pig jobs.

Is Dataproc fully managed?

Dataproc is a fully managed and highly scalable service for running Apache Spark, Apache Flink, Presto, and 30+ open source tools and frameworks.

Is Dataproc a pas?

Like most of the clouds out there, the Google Cloud offers all service levels, but for the rest of this article I'll be focusing on their data processing PaaS offer: Google Dataproc.


1 Answers

I figured it out.

gcloud dataproc jobs submit spark \                                   
    --cluster <cluster_name> \
    --class <class_name> \
    --properties spark.driver.extraJavaOptions='-Dhost=127.0.0.1 -Dlimit=10 -Dproperty_name=property_value' \
    --jars spark_job.jar
like image 69
Sagar Rakshe Avatar answered Oct 19 '22 20:10

Sagar Rakshe