Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to run zeppelin notebook from command line (automatically)

  1. How do we run the notebook from command line?

  2. Further to 1, how would I pass command line arguments into the notebook? I.e. access the command line args from within the notebook code?

like image 568
thousif ahmed Avatar asked Mar 28 '16 05:03

thousif ahmed


People also ask

How do I run a zeppelin notebook?

After creating a notebook, you can either run the required paragraphs by using the Run option for a paragraph or run all paragraphs by using the Run All Paragraphs option. If your cluster is running Zeppelin 0.8 or a later version, then all the paragraphs of the notebook are run sequentially.

How do I know if Zeppelin is running?

​Validating Zeppelin You can also check the status by opening the Zeppelin host with the port number that you configured for it in zeppelin-env.sh in a web browser: for example, http://zeppelin.local:9995 .

How do I open Zeppelin notebook locally?

Open Zeppelin in your browser by navigating to http://localhost:8080 . In Zeppelin in the browser, open the drop-down menu at anonymous in the upper-right corner of the page, and choose Interpreter. On the interpreters page, search for spark , and choose edit on the right.


2 Answers

So I had the same issue and managed to work out how to use the API to run a notebook using curl. As for passing in command line arguments think there is simply no way to do that - you will have to use some sort of shared state on the server (e.g. have the notebook read from a file, and modify the file).

Anyway this is how I managed to run a notebook, it assumes jq is installed. Pretty involved :(

curl -XGET http://${ip}:8080/api/interpreter/setting | jq '.body[] | .id'

interpreter_settings_ids=`curl -XGET http://${ip}:8080/api/interpreter/setting | jq '.body[] | .id'`

id_array="["`echo ${interpreter_settings_ids} | tr ' ' ','`"]"

curl -XPUT -d $id_array http://${ip}:8080/api/notebook/interpreter/bind/${notebook_id}

curl -XPOST http://${ip}:8080/api/notebook/job/${notebook_id}

If someone has manually clicked the "save" button for the interpreter binding then only the last command is required.

UPDATE:

OK I think you can loop to probe the status of the running notebook to determine if the notebook failed, see: https://github.com/eBay/Zeppelin/blob/master/docs/rest-api/rest-notebook.md

For example

function job_success {
    num_cells=`curl -XGET http://${ip}:8080/api/notebook/job/${notebook_id} 2>/dev/null | jq '.body[] | .status' | wc -l`
    num_successes=`curl -XGET http://${ip}:8080/api/notebook/job/${notebook_id} 2>/dev/null | jq '.body[] | .status' | grep FINISHED | wc -l`
    test ${num_cells} = ${num_successes}
}

function job_fail {
    curl -XGET http://${ip}:8080/api/notebook/job/${notebook_id} 2>/dev/null | jq '.body[] | .status' | grep ERROR
}

until job_success || job_fail
do
    sleep 10
done
like image 77
samthebest Avatar answered Oct 21 '22 10:10

samthebest


As of version 0.7.3 and perhaps earlier, Zeppelin has a REST API that lets you run notebooks. Your shell script can use curl to access the API.

The API includes methods to delete a paragraph and to insert a paragraph at a particular index. This allows you to express all your "parameters" as variables in paragraph 0 and then use them in later paragraphs. Make 3 calls to the REST API in this order:

  1. Delete the notebook's current paragraph 0.
  2. Insert a new paragraph containing variable assignments at index 0.
  3. Run the notebook.
like image 26
ChrisFal Avatar answered Oct 21 '22 09:10

ChrisFal