Issue with Spark Java API, Kerberos, and Hive

Question

I'm trying to run a spark sql test against a hive table using the Spark Java API. The problem I am having is with kerberos. Whenever I attempt to run the program I get this error message:

Caused by: org.apache.spark.sql.AnalysisException: java.lang.RuntimeException: org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled.  Available:[TOKEN, KERBEROS];
    at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
    at org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:194)
    at org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:114)
    at org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:102)
    at org.apache.spark.sql.hive.HiveSessionStateBuilder.externalCatalog(HiveSessionStateBuilder.scala:39)
    at org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog$lzycompute(HiveSessionStateBuilder.scala:54)
    at org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog(HiveSessionStateBuilder.scala:52)
    at org.apache.spark.sql.hive.HiveSessionStateBuilder$$anon$1.<init>(HiveSessionStateBuilder.scala:69)
    at org.apache.spark.sql.hive.HiveSessionStateBuilder.analyzer(HiveSessionStateBuilder.scala:69)
    at org.apache.spark.sql.internal.BaseSessionStateBuilder$$anonfun$build$2.apply(BaseSessionStateBuilder.scala:293)
    at org.apache.spark.sql.internal.BaseSessionStateBuilder$$anonfun$build$2.apply(BaseSessionStateBuilder.scala:293)
    at org.apache.spark.sql.internal.SessionState.analyzer$lzycompute(SessionState.scala:79)
    at org.apache.spark.sql.internal.SessionState.analyzer(SessionState.scala:79)
    at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:57)
    at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:55)
    at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:47)
    at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:74)
    at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:638)
    at tester.SparkSample.lambda$0(SparkSample.java:62)
    ... 5 more

on this line of code:

    ss.sql("select count(*) from entps_pma.baraccount").show();

Now when I run the code, I log into kerberos just fine and get this message:

18/05/01 11:21:03 INFO security.UserGroupInformation: Login successful for user <kerberos user> using keytab file /root/hdfs.keytab

I even connect to the Hive Metastore:

18/05/01 11:21:06 INFO hive.metastore: Trying to connect to metastore with URI thrift://<hiveserver>:9083
18/05/01 11:21:06 INFO hive.metastore: Connected to metastore.

But right after that I get the error. Appreciate any direction here. Here is my code:

public static void runSample(String fullPrincipal) throws IOException {

    System.setProperty("hive.metastore.sasl.enabled", "true");
    System.setProperty("hive.security.authorization.enabled", "true");
    System.setProperty("hive.metastore.kerberos.principal", fullPrincipal);
    System.setProperty("hive.metastore.execute.setugi", "true");
    System.setProperty("hadoop.security.authentication", "kerberos");

    Configuration conf = setSecurity(fullPrincipal);

    loginUser = UserGroupInformation.getLoginUser();
    loginUser.doAs((PrivilegedAction<Void>) () -> {

    SparkConf sparkConf = new SparkConf().setMaster("local");
        sparkConf.set("spark.sql.warehouse.dir", "hdfs:///user/hive/warehouse");
        sparkConf.set("hive.metastore.uris", "thrift://<hive server>:9083");
        sparkConf.set("hadoop.security.authentication", "kerberos");
        sparkConf.set("hadoop.rpc.protection", "privacy");
        sparkConf.set("spark.driver.extraClassPath",
                "/opt/cloudera/parcels/CDH/jars/*.jar:/opt/cloudera/parcels/CDH/lib/hive/conf:/opt/cloudera/parcels/CDH/lib/hive/lib/*.jar");
        sparkConf.set("spark.executor.extraClassPath",
                "/opt/cloudera/parcels/CDH/jars/*.jar:/opt/cloudera/parcels/CDH/lib/hive/conf:/opt/cloudera/parcels/CDH/lib/hive/lib/*.jar");
        sparkConf.set("spark.eventLog.enabled", "false");

        SparkSession ss = SparkSession
              .builder()
              .enableHiveSupport()
              .config(sparkConf)
              .appName("Jim Test Spark App")
              .getOrCreate();

        ss.sparkContext()
            .hadoopConfiguration()
            .addResource(conf);

        ss.sql("select count(*) from entps_pma.baraccount").show();
        return null;
    });
}

carl · Accepted Answer

I guess you are running Spark on YARN. You need to specify spark.yarn.principal and spark.yarn.keytab parameters. Please check running Spark on YARN documentation

Issue with Spark Java API, Kerberos, and Hive

Tags:

java

apache-spark

apache-spark-sql

hive

kerberos

jymbo

1 Answers

carl

Recent Activity

Donate For Us

Issue with Spark Java API, Kerberos, and Hive

Tags:

java

apache-spark

apache-spark-sql

hive

kerberos

jymbo

1 Answers

carl

Related questions

Recent Activity

Donate For Us