Presto on Amazon S3

Tags:

I'm trying to use Presto on Amazon S3 bucket, but haven't found much related information on the Internet.

I've installed Presto on a micro instance but I'm not able to figure out how I could connect to S3. There is a bucket and there are files in it. I have a running hive metastore server and I have configured it in presto hive.properties. But when I try to run the LOCATION command in hive, its not working.

IT throws an error saying cannot find the file scheme type s3.

And also I do not know why we need to run hadoop but without hadoop the hive doesnt run. Is there any explanation to this.

This and this are the documentations i've followed while set up.

373

asked May 09 '16 06:05

Codex

1 Answers

Presto uses the Hive metastore to map database tables to their underlying files. These files can exist on S3, and can be stored in a number of formats - CSV, ORC, Parquet, Seq etc.

The Hive metastore is usually populated through HQL (Hive Query Language) by issuing DDL statements like CREATE EXTERNAL TABLE ... with a LOCATION ... clause referencing the underlying files that hold the data.

In order to get Presto to connect to a Hive metastore you will need to edit the hive.properties file (EMR puts this in /etc/presto/conf.dist/catalog/) and set the hive.metastore.uri parameter to the thrift service of an appropriate Hive metastore service.

The Amazon EMR cluster instances will automatically configure this for you if you select Hive and Presto, so it's a good place to start.

If you want to test this on a standalone ec2 instance then I'd suggest that you first focus on getting a functional hive service working with the Hadoop infrastructure. You should be able to define tables that reside locally on the hdfs file system. Presto complements hive, but does require a functioning hive set-up, presto's native ddl statements are not as feature complete as hive, so you'll do most table creation from hive directly.

Alternatively, you can define Presto connectors for a mysql or postgresql database, but it's just a jdbc pass through do I don't think you'll gain much.

119

answered Oct 11 '22 06:10

Euan

Related questions
                            
                                jenkins fails while restarting my sql "sudo: no tty present and no askpass program specified Sorry, try again."
                            
                                Running nginx infront of a unicorn or gunicorn under Elastic Load Balancer
                            
                                Get s3 metadata without getting object
                            
                                Configuring SNS Delivery Retry Policies
                            
                                Comprehensive guide to setting up a data driven website using Amazon web services for EC2
                            
                                How to detect that a puppet run is complete
                            
                                Cannot access tomcat instance installed in EC2
                            
                                How do I use Amazon Route 53 with a Digital Ocean droplet?
                            
                                Installing packages using apt-get in CloudFormation file
                            
                                AWS CLI tools on Circle CI: configure: unknown command
                            
                                Referencing Resources between CloudFormation stacks
                            
                                SQL Server 2012 CPU usage spikes to 100%
                            
                                Can I use Amazon SQS as a delay queue before sending to SNS?
                            
                                Boto3 EMR - Hive step
                            
                                Scala code doesnt fetch s3 file
                            
                                How can we set CORS for folders in Amazon S3 Buckets
                            
                                AWS PHP SDK: Limit S3 file upload size in presigned URL
                            
                                How can I set the content md5 when I upload a file to S3?
                            
                                Which domain-name should I bake into my IoT device as IoT MQTT endpoint?
                            
                                Does CloudFront make use of Cache-Control headers in error responses (e.g. 503) from the origin?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Presto on Amazon S3

Tags:

amazon-web-services

amazon-s3

amazon-ec2

presto

Codex

People also ask

1 Answers

Euan

Recent Activity

Donate For Us