I've loaded tab separated files into S3 that with this type of folders under the bucket: bucket --> se --> y=2013 --> m=07 --> d=14 --> h=00
each subfolder has 1 file that represent on hour of my traffic.
I then created an EMR workflow to run in interactive mode with hive.
When I log in to the master and get into hive I run this command:
CREATE EXTERNAL TABLE se (
id bigint,
oc_date timestamp)
partitioned by (y string, m string, d string, h string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION 's3://bi_data';
I get this error message:
FAILED: Error in metadata: java.lang.IllegalArgumentException: The bucket name parameter must be specified when listing objects in a bucket
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
Can anybody help?
UPDATE Even if I try to use string fields only, I get the same error. Create table with strings:
CREATE EXTERNAL TABLE se (
id string,
oc_date string)
partitioned by (y string, m string, d string, h string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION 's3://bi_data';
Hive version 0.8.1.8
To create a Hive table on top of those files, you have to specify the structure of the files by giving columns names and types. CREATE EXTERNAL TABLE posts (title STRING, comment_count INT) LOCATION 's3://my-bucket/files/'; Here is a list of all types allowed.
So, the solution is that I had two mistakes:
When writing only the bucket name you should have a trailing slash in the S3 path. reference here
The underscore is also an issue, the bucket name should be DNS compliant.
Hope I helped someone with this.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With