Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

create hive table from tab separated file in s3 using interactive mode

I've loaded tab separated files into S3 that with this type of folders under the bucket: bucket --> se --> y=2013 --> m=07 --> d=14 --> h=00

each subfolder has 1 file that represent on hour of my traffic.

I then created an EMR workflow to run in interactive mode with hive.

When I log in to the master and get into hive I run this command:

CREATE EXTERNAL TABLE se (
id bigint,
oc_date timestamp)
partitioned by (y string, m string, d string, h string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION 's3://bi_data';

I get this error message:

FAILED: Error in metadata: java.lang.IllegalArgumentException: The bucket name parameter must be specified when listing objects in a bucket

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask

Can anybody help?

UPDATE Even if I try to use string fields only, I get the same error. Create table with strings:

CREATE EXTERNAL TABLE se (
id string,
oc_date string)
partitioned by (y string, m string, d string, h string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION 's3://bi_data';

Hive version 0.8.1.8

like image 634
Gluz Avatar asked Jul 14 '13 13:07

Gluz


People also ask

How do I create a Hive table on AWS s3?

To create a Hive table on top of those files, you have to specify the structure of the files by giving columns names and types. CREATE EXTERNAL TABLE posts (title STRING, comment_count INT) LOCATION 's3://my-bucket/files/'; Here is a list of all types allowed.


1 Answers

So, the solution is that I had two mistakes:

  1. When writing only the bucket name you should have a trailing slash in the S3 path. reference here

  2. The underscore is also an issue, the bucket name should be DNS compliant.

Hope I helped someone with this.

like image 97
Gluz Avatar answered Sep 27 '22 19:09

Gluz