Issue when executing a show create table
and then executing the resulting create table
statement if the table is ORC.
Using show create table
, you get this:
STORED AS INPUTFORMAT
‘org.apache.hadoop.hive.ql.io.orc.OrcInputFormat’
OUTPUTFORMAT
‘org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat’
But if you create the table with those clauses, you will then get the casting error when selecting. Error likes:
Failed with exception java.io.IOException:java.lang.ClassCastException: org.apache.hadoop.hive.ql.io.orc.OrcStruct cannot be cast to org.apache.hadoop.io.BinaryComparable
To fix this, just change create table
statement to STORED AS ORC
But, as the answer said in the similar question:
What is the difference between 'InputFormat, OutputFormat' & 'Stored as' in Hive? .
I can't figure out the reason.
InputFormat and OutputFormat - allows you to describe you the original data structure so that Hive could properly map it to the table view. SerDe - represents the class which performs actual translation of data from table view to the low level input-output format structures and opposite.
¶ Qubole Hive uses CombineHiveInputFormat by default and treats underlying files as text-files by default. During table definition – users can indicate that the file type of of an alternative format (such as a Sequence or RC File).
The TBLPROPERTIES clause allows you to tag the table definition with your own metadata key/value pairs. Some predefined table properties also exist, such as last_modified_user and last_modified_time which are automatically added and managed by Hive.
STORED AS
implies 3 things:
You have defined only the last 2, leaving the SERDE to be defined by hive.default.serde
hive.default.serde
Default Value: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Added in: Hive 0.14 with HIVE-5976
The default SerDe Hive will use for storage formats that do not specify a SerDe.
Storage formats that currently do not specify a SerDe include 'TextFile, RcFile'.
hive.default.serde
set hive.default.serde;
hive.default.serde=org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
STORED AS ORC
create table mytable (i int)
stored as orc;
show create table mytable;
Note that the SERDE is 'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
CREATE TABLE `mytable`(
`i` int)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
'file:/home/cloudera/local_db/mytable'
TBLPROPERTIES (
'COLUMN_STATS_ACCURATE'='{\"BASIC_STATS\":\"true\"}',
'numFiles'='0',
'numRows'='0',
'rawDataSize'='0',
'totalSize'='0',
'transient_lastDdlTime'='1496982059')
STORED AS INPUTFORMAT ... OUTPUTFORMAT ...
create table mytable2 (i int)
STORED AS
INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
;
show create table mytable2
;
Note that the SERDE is 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
CREATE TABLE `mytable2`(
`i` int)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
'file:/home/cloudera/local_db/mytable2'
TBLPROPERTIES (
'COLUMN_STATS_ACCURATE'='{\"BASIC_STATS\":\"true\"}',
'numFiles'='0',
'numRows'='0',
'rawDataSize'='0',
'totalSize'='0',
'transient_lastDdlTime'='1496982426')
You сan specify INPUTFORMAT
, OUTPUTFORMAT
, SERDE
in STORED AS
when creating table. Hive allows you to separate your record format from your file format. You can provide custom classes for INPUTFORMAT
, OUTPUTFORMAT
, SERDE
. See details: http://www.dummies.com/programming/big-data/hadoop/defining-table-record-formats-in-hive/
Alternatively you can write simply STORED AS ORC
or STORED AS TEXTFILE
for example.
STORED AS ORC statement already takes care about INPUTFORMAT
, OUTPUTFORMAT
and SERDE
. This allows you not to write those long fully qualified Java class names for INPUTFORMAT
, OUTPUTFORMAT
, SERDE
. Just STORED AS ORC
instead.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With