I want to use Jupyter/iPython on Cloud Dataproc. How can I automatically install and configure it when I create new clusters?
The Cloud Dataproc team has a GitHub repository of initialization actions containing sample and often-used initialization actions. There is specifically one for iPython in the repository you can use to automatically install and configure iPython. The initialization action page has more details on how to use the scripts when creating a new cluster.
The tl;dr process:
Create a new cluster with the Google Cloud SDK using the --initalization-actions flag:
gcloud beta dataproc clusters create <my-dataproc-cluster> --initialization-actions gs://<my-bucket>/ipython.sh
Create an SSL tunnel and SOCKS proxy to the cluster
http://<my-dataproc-cluster>-m:8123In the example above you need to replace <my-bucket> with the name of your Cloud Storage bucket and <my-dataproc-cluster> with the name of your cluster. Also note that for step #5 the URL should add a -m to the name of your cluster so you access your master node.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With