Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Issue with Spark Java API, Kerberos, and Hive

I'm trying to run a spark sql test against a hive table using the Spark Java API. The problem I am having is with kerberos. Whenever I attempt to run the program I get this error message:

Caused by: org.apache.spark.sql.AnalysisException: java.lang.RuntimeException: org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled.  Available:[TOKEN, KERBEROS];
    at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
    at org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:194)
    at org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:114)
    at org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:102)
    at org.apache.spark.sql.hive.HiveSessionStateBuilder.externalCatalog(HiveSessionStateBuilder.scala:39)
    at org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog$lzycompute(HiveSessionStateBuilder.scala:54)
    at org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog(HiveSessionStateBuilder.scala:52)
    at org.apache.spark.sql.hive.HiveSessionStateBuilder$$anon$1.<init>(HiveSessionStateBuilder.scala:69)
    at org.apache.spark.sql.hive.HiveSessionStateBuilder.analyzer(HiveSessionStateBuilder.scala:69)
    at org.apache.spark.sql.internal.BaseSessionStateBuilder$$anonfun$build$2.apply(BaseSessionStateBuilder.scala:293)
    at org.apache.spark.sql.internal.BaseSessionStateBuilder$$anonfun$build$2.apply(BaseSessionStateBuilder.scala:293)
    at org.apache.spark.sql.internal.SessionState.analyzer$lzycompute(SessionState.scala:79)
    at org.apache.spark.sql.internal.SessionState.analyzer(SessionState.scala:79)
    at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:57)
    at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:55)
    at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:47)
    at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:74)
    at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:638)
    at tester.SparkSample.lambda$0(SparkSample.java:62)
    ... 5 more

on this line of code:

    ss.sql("select count(*) from entps_pma.baraccount").show();

Now when I run the code, I log into kerberos just fine and get this message:

18/05/01 11:21:03 INFO security.UserGroupInformation: Login successful for user <kerberos user> using keytab file /root/hdfs.keytab

I even connect to the Hive Metastore:

18/05/01 11:21:06 INFO hive.metastore: Trying to connect to metastore with URI thrift://<hiveserver>:9083
18/05/01 11:21:06 INFO hive.metastore: Connected to metastore.

But right after that I get the error. Appreciate any direction here. Here is my code:

public static void runSample(String fullPrincipal) throws IOException {

    System.setProperty("hive.metastore.sasl.enabled", "true");
    System.setProperty("hive.security.authorization.enabled", "true");
    System.setProperty("hive.metastore.kerberos.principal", fullPrincipal);
    System.setProperty("hive.metastore.execute.setugi", "true");
    System.setProperty("hadoop.security.authentication", "kerberos");

    Configuration conf = setSecurity(fullPrincipal);

    loginUser = UserGroupInformation.getLoginUser();
    loginUser.doAs((PrivilegedAction<Void>) () -> {

    SparkConf sparkConf = new SparkConf().setMaster("local");
        sparkConf.set("spark.sql.warehouse.dir", "hdfs:///user/hive/warehouse");
        sparkConf.set("hive.metastore.uris", "thrift://<hive server>:9083");
        sparkConf.set("hadoop.security.authentication", "kerberos");
        sparkConf.set("hadoop.rpc.protection", "privacy");
        sparkConf.set("spark.driver.extraClassPath",
                "/opt/cloudera/parcels/CDH/jars/*.jar:/opt/cloudera/parcels/CDH/lib/hive/conf:/opt/cloudera/parcels/CDH/lib/hive/lib/*.jar");
        sparkConf.set("spark.executor.extraClassPath",
                "/opt/cloudera/parcels/CDH/jars/*.jar:/opt/cloudera/parcels/CDH/lib/hive/conf:/opt/cloudera/parcels/CDH/lib/hive/lib/*.jar");
        sparkConf.set("spark.eventLog.enabled", "false");

        SparkSession ss = SparkSession
              .builder()
              .enableHiveSupport()
              .config(sparkConf)
              .appName("Jim Test Spark App")
              .getOrCreate();

        ss.sparkContext()
            .hadoopConfiguration()
            .addResource(conf);

        ss.sql("select count(*) from entps_pma.baraccount").show();
        return null;
    });
}
like image 915
jymbo Avatar asked Dec 19 '25 22:12

jymbo


1 Answers

I guess you are running Spark on YARN. You need to specify spark.yarn.principal and spark.yarn.keytab parameters. Please check running Spark on YARN documentation

like image 135
carl Avatar answered Dec 21 '25 12:12

carl



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!