Setup Standalone Hive Metastore Service For Presto and AWS S3

Tags:

I'm working in an environment where I have an S3 service being used as a data lake, but not AWS Athena. I'm trying to setup Presto to be able to query the data in S3 and I know I need the define the data structure as Hive tables through the Hive Metastore service. I'm deploying each component in Docker, so I'd like to keep the container size as minimal as possible. What components from Hive do I need to be able to just run the Metastore service? I don't really actually care about running Hive, just the Metastore. Can I trim down what's needed, or is there already a pre-configured package just for that? I haven't been able to find anything online that doesn't include downloading all of Hadoop and Hive. Is what I'm trying to do possible?

908

asked Feb 22 '18 16:02

mhaken

1 Answers

There is a workaround, that you do not need hive to run presto. However I haven't tried that with any distributed file system like s3, but code suggest it should work (at least with HDFS). In my opinion it is worth trying, because you do not need any new docker image for hive at all.

The idea is to use a builtin FileHiveMetastore. It is neither documented nor advised to be used in production but you could play with it. Schema information is stored next to the data in the file system. Obviously, it has its prons and cons. I do not know the details of your use case, so I don't know if it fits your needs.

Configuration:

connector.name=hive-hadoop2
hive.metastore=file
hive.metastore.catalog.dir=file:///tmp/hive_catalog
hive.metastore.user=cox

Demo:

presto:tiny> create schema hive.default;
CREATE SCHEMA
presto:tiny> use hive.default;
USE
presto:default> create table t (t bigint);
CREATE TABLE
presto:default> show tables;
 Table
-------
 t
(1 row)

Query 20180223_202609_00009_iuchi, FINISHED, 1 node
Splits: 18 total, 18 done (100.00%)
0:00 [1 rows, 18B] [11 rows/s, 201B/s]

presto:default> insert into t (values 1);
INSERT: 1 row

Query 20180223_202616_00010_iuchi, FINISHED, 1 node
Splits: 51 total, 51 done (100.00%)
0:00 [0 rows, 0B] [0 rows/s, 0B/s]

presto:default> select * from t;
 t
---
 1
(1 row)

After the above I was able to find the following on my machine:

/tmp/hive_catalog/
/tmp/hive_catalog/default
/tmp/hive_catalog/default/t
/tmp/hive_catalog/default/t/.prestoPermissions
/tmp/hive_catalog/default/t/.prestoPermissions/user_cox
/tmp/hive_catalog/default/t/.prestoPermissions/.user_cox.crc
/tmp/hive_catalog/default/t/.20180223_202616_00010_iuchi_79dee041-58a3-45ce-b86c-9f14e6260278.crc
/tmp/hive_catalog/default/t/.prestoSchema
/tmp/hive_catalog/default/t/20180223_202616_00010_iuchi_79dee041-58a3-45ce-b86c-9f14e6260278
/tmp/hive_catalog/default/t/..prestoSchema.crc
/tmp/hive_catalog/default/.prestoSchema
/tmp/hive_catalog/default/..prestoSchema.crc

answered Sep 22 '22 01:09

kokosing

Related questions
                            
                                Hive partitioned table reads all the partitions despite having a Spark filter
                            
                                How to make R tm corpus of 100 million tweets?
                            
                                Distinct on Multiple columns in Hive
                            
                                hive - how to drop external hive table along with data
                            
                                Spark SQL saveAsTable is not compatible with Hive when partition is specified
                            
                                Any way to compute statistics on a hive table for all partitions with a single analyze command?
                            
                                Hive Query- Joining two tables on three joining conditions with OR operator
                            
                                Hive: How to test and find for null map entries?
                            
                                HIVE - INSERT OVERWRITE vs DROP TABLE + CREATE TABLE + INSERT INTO
                            
                                Hive error: parseexception missing EOF
                            
                                Comparing two tables for equality in HIVE
                            
                                Dropping a range of partitions in HIVE
                            
                                External Hive Table Refresh table vs MSCK Repair
                            
                                Is there a way to load CSV data into a "binary" Hive format?
                            
                                Can i point multiple location to same hive external table?
                            
                                Where is an Avro schema stored when I create a hive table with 'STORED AS AVRO' clause?
                            
                                hive Expression Not In Group By Key
                            
                                Hive QL Except clause
                            
                                SQL/Hive count distinct column
                            
                                Date Format Conversion in Hive

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Setup Standalone Hive Metastore Service For Presto and AWS S3

Tags:

hive

presto

hive-metastore

mhaken

People also ask

1 Answers

kokosing

Recent Activity

Donate For Us