Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Automatically shutdown Google Dataproc cluster after all jobs are completed

How can I programmatically shutdown a Google Dataproc cluster automatically after all jobs have completed?

Dataproc provides creation, monitoring and management. But it seems I cannot find out how to delete the cluster.

like image 274
Sreenath Chothar Avatar asked May 08 '17 07:05

Sreenath Chothar


2 Answers

The gcloud dataproc CLI interface offers the max-idle option. This automatically kills the Dataproc cluster after an x amount of inactivity (i.e. no running jobs). It can be used as follows:

gcloud dataproc clusters create test-cluster \
    --project my-test-project \
    --zone europe-west1-b \
    --master-machine-type n1-standard-4 \
    --master-boot-disk-size 100 \
    --num-workers 2 \
    --worker-machine-type n1-standard-4 \
    --worker-boot-disk-size 100 \
    --max-idle 1h
like image 113
Martijn Van de Grift Avatar answered Oct 03 '22 17:10

Martijn Van de Grift


It depends on the language. Personally, I use Python (pyspark) and the code provided here worked fine for me:

https://github.com/GoogleCloudPlatform/python-docs-samples/blob/master/dataproc/submit_job_to_cluster.py

You may need to adapt the code to your purpose and follow the prerequisite steps specified in the README file (https://github.com/GoogleCloudPlatform/python-docs-samples/tree/master/dataproc), like enabling the API and installing the packages in requirements.txt .

Basically, using the function wait_for_job you wait until the job has finished, and with delete_cluster , as the name says, the cluster that you have previously created get deleted. I hope this can help you.

like image 37
Gaspar Avit Ferrero Avatar answered Oct 03 '22 19:10

Gaspar Avit Ferrero