Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Exporting Hive Table to a S3 bucket

I've created a Hive Table through an Elastic MapReduce interactive session and populated it from a CSV file like this:

CREATE TABLE csvimport(id BIGINT, time STRING, log STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t';

LOAD DATA LOCAL INPATH '/home/hadoop/file.csv' OVERWRITE INTO TABLE csvimport;

I now want to store the Hive table in a S3 bucket so the table is preserved once I terminate the MapReduce instance.

Does anyone know how to do this?

like image 950
seedhead Avatar asked Feb 28 '12 20:02

seedhead


People also ask

Can Hive be used with S3?

The Hive connector can read and write tables that are stored in Amazon S3 or S3-compatible systems. This is accomplished by having a table or database location that uses an S3 prefix, rather than an HDFS prefix.

How does Hive work with S3?

Hive Use CasesRunning Hive on the EMR clusters enables Airbnb analysts to perform ad hoc SQL queries on data stored in the S3 data lake. By migrating to a S3 data lake, Airbnb reduced expenses, can now do cost attribution, and increased the speed of Apache Spark jobs by three times their original speed.


2 Answers

Yes you have to export and import your data at the start and end of your hive session

To do this you need to create a table that is mapped onto S3 bucket and directory

CREATE TABLE csvexport (   id BIGINT, time STRING, log STRING   )   row format delimited fields terminated by ','   lines terminated by '\n'   STORED AS TEXTFILE  LOCATION 's3n://bucket/directory/'; 

Insert data into s3 table and when the insert is complete the directory will have a csv file

 INSERT OVERWRITE TABLE csvexport   select id, time, log  from csvimport; 

Your table is now preserved and when you create a new hive instance you can reimport your data

Your table can be stored in a few different formats depending on where you want to use it.

like image 185
user495732 Why Me Avatar answered Oct 02 '22 01:10

user495732 Why Me


Above Query needs to use EXTERNAL keyword, i.e:

CREATE EXTERNAL TABLE csvexport ( id BIGINT, time STRING, log STRING )  row format delimited fields terminated by ',' lines terminated by '\n'  STORED AS TEXTFILE LOCATION 's3n://bucket/directory/'; INSERT OVERWRITE TABLE csvexport select id, time, log from csvimport; 

An another alternative is to use the query

INSERT OVERWRITE DIRECTORY 's3n://bucket/directory/'  select id, time, log from csvimport; 

the table is stored in the S3 directory with HIVE default delimiters.

like image 34
Thejas Avatar answered Oct 02 '22 01:10

Thejas