Exporting Hive Table to a S3 bucket

Tags:

I've created a Hive Table through an Elastic MapReduce interactive session and populated it from a CSV file like this:

CREATE TABLE csvimport(id BIGINT, time STRING, log STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t';

LOAD DATA LOCAL INPATH '/home/hadoop/file.csv' OVERWRITE INTO TABLE csvimport;

I now want to store the Hive table in a S3 bucket so the table is preserved once I terminate the MapReduce instance.

Does anyone know how to do this?

950

asked Feb 28 '12 20:02

seedhead

2 Answers

Yes you have to export and import your data at the start and end of your hive session

To do this you need to create a table that is mapped onto S3 bucket and directory

CREATE TABLE csvexport (   id BIGINT, time STRING, log STRING   )   row format delimited fields terminated by ','   lines terminated by '\n'   STORED AS TEXTFILE  LOCATION 's3n://bucket/directory/';

Insert data into s3 table and when the insert is complete the directory will have a csv file

 INSERT OVERWRITE TABLE csvexport   select id, time, log  from csvimport;

Your table is now preserved and when you create a new hive instance you can reimport your data

Your table can be stored in a few different formats depending on where you want to use it.

185

answered Oct 02 '22 01:10

user495732 Why Me

Above Query needs to use EXTERNAL keyword, i.e:

CREATE EXTERNAL TABLE csvexport ( id BIGINT, time STRING, log STRING )  row format delimited fields terminated by ',' lines terminated by '\n'  STORED AS TEXTFILE LOCATION 's3n://bucket/directory/'; INSERT OVERWRITE TABLE csvexport select id, time, log from csvimport;

An another alternative is to use the query

INSERT OVERWRITE DIRECTORY 's3n://bucket/directory/'  select id, time, log from csvimport;

the table is stored in the S3 directory with HIVE default delimiters.

answered Oct 02 '22 01:10

Thejas

Related questions
                            
                                How can I roll back the Terraform state of my config in S3?
                            
                                File metadata not kept in S3 after a CLI copy
                            
                                Cloud Formation: S3 linked to Lambda gives The ARN is not well formed
                            
                                Athena and S3 Inventory. HIVE_BAD_DATA: Field size's type LONG in ORC is incompatible with type varchar defined in table schema
                            
                                (AWS) Athena: Query Results seem too short
                            
                                React Deploying to AWS S3 production using npm - index.html file as last
                            
                                Boto3 not uploading zip file to S3 python
                            
                                AWS S3 doesObjectExist costs
                            
                                Upload zip archive files to S3 with node
                            
                                AWS S3 presigned urls with boto3 - Signature mismatch
                            
                                Airflow won't write logs to s3
                            
                                How to use dynamic bucket name in Multer-s3 for file upload
                            
                                Using Athena Terraform Scripts
                            
                                Import failure of s3fs library in AWS Glue
                            
                                S3 Select retrieve headers in the CSV
                            
                                How do I do an S3 copy between regions using aws cli?
                            
                                How to define the principal for an AWS policy statement?
                            
                                Alternative to Amazon S3 for the data center?
                            
                                Best practices for storing references to AWS S3 objects in a database?
                            
                                AWS S3 - CORS OPTIONS Preflight throwing 400 Bad Request during DELETE w/ VersionId

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Exporting Hive Table to a S3 bucket

Tags:

amazon-s3

hive

emr

elastic-map-reduce

seedhead

People also ask

2 Answers

user495732 Why Me

Thejas

Recent Activity

Donate For Us