Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

OSS supported by Google Cloud Dataproc

When I go to https://cloud.google.com/dataproc, I see this ...

"Dataproc is a fully managed and highly scalable service for running Apache Spark, Apache Flink, Presto, and 30+ open source tools and frameworks."

But gcloud dataproc jobs submit doesn't list all of them. It lists only 8 (hadoop, hive, pig, presto, pyspark, spark, spark-r, spark-sql). Any idea why?

~ gcloud dataproc jobs submit
ERROR: (gcloud.dataproc.jobs.submit) Command name argument expected.

Available commands for gcloud dataproc jobs submit:

      hadoop                  Submit a Hadoop job to a cluster.
      hive                    Submit a Hive job to a cluster.
      pig                     Submit a Pig job to a cluster.
      presto                  Submit a Presto job to a cluster.
      pyspark                 Submit a PySpark job to a cluster.
      spark                   Submit a Spark job to a cluster.
      spark-r                 Submit a SparkR job to a cluster.
      spark-sql               Submit a Spark SQL job to a cluster.

For detailed information on this command and its flags, run:
  gcloud dataproc jobs submit --help
like image 333
Naga Vijayapuram Avatar asked Jan 22 '26 07:01

Naga Vijayapuram


1 Answers

Some OSS components are offered as Dataproc Optional Components. Not of all them have a job submit API, some (e.g., Anaconda, Jupyter) don't need one, some (e.g., Flink, Druid) might add in the future.

Some other OSS components are offered as libraries, e.g., GCS connector, BigQuery connector, Apache Parquet.

like image 78
Dagang Avatar answered Jan 24 '26 22:01

Dagang



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!