Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

sparklyr livy connection with Kerberos

I'm able to connect to non-Kerberized spark cluster through Livy service without problems from a remote Rstudio desktop (windows).

However, if the Kerberos security is enabled, the connection fails:

library(sparklyr)
sc <- spark_connect("http://host:8998", method = "livy")

returning

Error in livy_validate_http_response("Failed to create livy session",  : 
Livy operation is unauthorized. Try spark_connect with config = livy_config()

using sparklyr_0.5.6-9002 and MIT Kerberos for Windows for the authentication.

On the other hand, from within the cluster (i.e. through curl) the connection is successful.

What am I doing wrong? What additional settings are required for such connection?

The livy_config(..., username, password) config seems to be forming only a Authorization: Basic ... header, though here I'd suspect a Negotiate or Kerberos(?) should be required instead.

Are there any other possible configurations I'm missing?

NB: same error is returned from RStudio Server (web) after kinit'ing from the shell with authorized user.

like image 449
runr Avatar asked Nov 07 '22 20:11

runr


1 Answers

I'm coming late to the party, but I had the same problem and was finally able to solve it. This could be useful to others.

Of course this may depend a lot on your cluster configuration. I'm using sparklyr 1.5.0, and MIT Kerberos for Windows, with direct connection to Livy (no Knox proxy) running in a Cloudera HDP cluster (Spark 2.3.0). In my case an extra HTTP header was required, see below.

If your cluster doesn't allow outgoing internet connections, you should also first save the SparklyR server-side jar on HDFS (by default it is automatically downloaded from GitHub).

library(sparklyr)
SPARK_VERSION = "2.3.0"

lcfg = livy_config(
  negotiate = TRUE, 
  custom_headers = list("X-Requested-By"="<user_name>"))
lcfg$sparklyr.livy.jar = "hdfs:///path/to/sparklyr-2.3-2.11.jar"

sc = spark_connect(
  master = "http://livyserver:8999", method = "livy", 
  version = SPARK_VERSION,
  config = lcfg)

For debugging, a first step might be to test your Livy setup outside of the cluster but without R: see https://livy.apache.org/examples/

like image 200
Pierre Gramme Avatar answered Nov 15 '22 07:11

Pierre Gramme