Is it possible to add Presto interpreter to Zeppelin on AWS EMR 4.3 and if so, could someone please post the instructions? I have Presto-Sandbox and Zeppelin-Sandbox running on EMR.
There's no official Presto interpreter for Zeppelin, and the conclusion of the Jira ticket raised is that it's not necessary because you can just use the jdbc interpreter
https://issues.apache.org/jira/browse/ZEPPELIN-27
I'm running a later EMR with presto & zeppelin, and the default set of interpreters doesn't include jdbc, but it can be installed using a ssh to the master node and running
sudo /usr/lib/zeppelin/bin/install-interpreter.sh --name jdbc
Even better is to use that as a bootstrap script.
Then you can add a new interpreter in Zeppelin.
Give it a name like presto, meaning you need to use %presto as a directive on the first line of a paragraph in zeppelin, or set it as the default interpreter.
The settings you need here are:
default.driver
com.facebook.presto.jdbc.PrestoDriver
default.url
jdbc:presto://<YOUR EMR CLUSTER MASTER DNS>:8889
default.user
hadoop
Note there's no password provided because the EMR environment should be using IAM roles, and ppk keys etc for authentication.
You will also need a Dependency for the presto JDBC driver jar. There's multiple ways to add dependencies in Zeppelin, but one easy way is via a maven groupid:artifactid:version
reference in the interpreter settings under Dependencies
e.g.
under artifact
com.facebook.presto:presto-jdbc:0.170
Note the version 0.170 corresponds to the version of Presto currently deployed on EMR, which will change in the future. You can see in the AWS EMR settings which version is being deployed to your cluster.
You can also get Zeppelin to connect directly to a catalog, or a catalog & schema by appending them to the default.url setting As per the Presto docs for the JDBC driver https://prestodb.io/docs/current/installation/jdbc.html
e.g. As an example, using Presto with a hive metastore with a database called datakeep
jdbc:presto://<YOUR EMR CLUSTER MASTER DNS>:8889/hive
OR
jdbc:presto://<YOUR EMR CLUSTER MASTER DNS>:8889/hive/datakeep
UPDATE Feb 2018
EMR 5.11.1 is using presto 0.187 and there is an issue in the way Zeppelin interpreter provides properties to the Presto Driver, causing an error something like Unrecognized connection property 'url'
Currently the only solutions appear to be using an older version in the artifact, or manually uploading a patched presto driver See https://github.com/prestodb/presto/issues/9254 and https://issues.apache.org/jira/browse/ZEPPELIN-2891
In my case using an old reference to a driver (apparently must be older than 0.180) e.g. com.facebook.presto:presto-jdbc:0.179
did not work, and zeppelin gave me an error about can't download dependencies. Funny error but probably something to do with Zeppelin's local maven repo not containing this, not sure I gave up on that.
I can confirm that patching the driver works.
git checkout 0.187
mvn clean package
jdbc:presto://<YOUR EMR CLUSTER MASTER DNS>:8889?user=hadoop
Up and running. Meanwhile, it might be worth considering Athena as an alternative to Presto give it's serverless & is effectively just a fork of Presto. It does have limitation to External hive tables only, and they must be created in Athena's own catalog (or now in AWS Glue catalog, also restricted to External tables).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With