I am trying to submit a spark job on Dataproc cluster. The job needs multiple system properties. I am able to pass just one as follows:
gcloud dataproc jobs submit spark \
--cluster <cluster_name> \
--class <class_name> \
--properties spark.driver.extraJavaOptions=-Dhost=127.0.0.1 \
--jars spark_job.jar
How do I pass multiple properties? I tried as follow, even this didn't work.
--properties ^#^spark.driver.extraJavaOptions=-Dhost=127.0.0.1,-Dlimit=10
What type of jobs can I run? Dataproc provides out-of-the box and end-to-end support for many of the most popular job types, including Spark, Spark SQL, PySpark, MapReduce, Hive, and Pig jobs.
Dataproc is a fully managed and highly scalable service for running Apache Spark, Apache Flink, Presto, and 30+ open source tools and frameworks.
Like most of the clouds out there, the Google Cloud offers all service levels, but for the rest of this article I'll be focusing on their data processing PaaS offer: Google Dataproc.
I figured it out.
gcloud dataproc jobs submit spark \
--cluster <cluster_name> \
--class <class_name> \
--properties spark.driver.extraJavaOptions='-Dhost=127.0.0.1 -Dlimit=10 -Dproperty_name=property_value' \
--jars spark_job.jar
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With