Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Access AWS Glue from local Spark

Is there any way to run local master Spark SQL queries against AWS Glue?

Launch this code on my local PC:

SparkSession.builder()
    .master("local")
    .enableHiveSupport()
    .config("hive.metastore.client.factory.class", "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory")
    .getOrCreate()
    .sql("show databases"); // this query isn't running against AWS Glue

EDIT based on some examples it appears that the hive.metastore.uris configuration key should allow specifying a specific metastore url, however, it's not clear how to get the relevant value for glue

SparkSession.builder()
    .master("local")
    .enableHiveSupport()
    .config("hive.metastore.client.factory.class", "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory")
    .config("hive.metastore.uris", "thrift://???:9083")
    .getOrCreate()
    .sql("show databases"); // this query isn't running against AWS Glue
like image 513
VB_ Avatar asked Sep 15 '18 12:09

VB_


1 Answers

Amazon provide this client that should solve the problem. (didn't try it yet)

https://github.com/awslabs/aws-glue-data-catalog-client-for-apache-hive-metastore

like image 127
Ophir Yoktan Avatar answered Oct 01 '22 05:10

Ophir Yoktan