So i have external tables of tab delimited data. A simple table looks like this:
create external table if not exists categories
(id string, tag string, legid string, image string, parent string, created_date string, time_stamp int)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION 's3n://somewhere/';
Now I'm adding another field to the end, it will be a comma separated list of values.
Is there a way to specify this in the same way that I specify a field terminator, or do I have to rely on one of the serdes?
eg:
...list_of_names ARRAY<String>)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' ARRAY ELEMENTS SEPARATED BY ','
...
(I'm assuming I'll need to use a serde for this, but I figured there wasn't any harm in asking)
So adding new columns into a table is a relatively cheap metadata-only operation as Hive does not modify the existing data files. Then when you retrieve data from the table Hive sets NULL values for columns that do not exist in old data files.
When you create external table with out location , the data will be stored in the hive default location. Usually /apps/hive/warehouse/<database_name>. db/<table_name> . If you drop the table , you can find the data in this location.
one would prefer to choose external tables if the data is stored outside HDFS like S3.
An external table is stored on HDFS or any storage compatible with HDFS, because we want to use the data outside of Hive. Thus, Hive is not responsible for managing the storage of the external table. Tables can be stored on an external location for instance on a cloud platform like google cloud or AWS.
I don't know how to update an existing table to do that, but for creating a table; what you are looking for can be found, in depth, at https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL. A snippet from there
row_format
: DELIMITED [FIELDS TERMINATED BY char] [COLLECTION ITEMS TERMINATED BY char]
[MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
An example from our table creation is
CREATE TABLE IF NOT EXISTS visits
(
... Columns Removed...
)
PARTITIONED BY (userdate STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\001'
COLLECTION ITEMS TERMINATED BY '\002'
MAP KEYS TERMINATED BY '\003'
STORED AS TEXTFILE
;
The line from that you'd be looking for is the COLLECTION ITEMS TERMINATED BY char
for an array.
hth
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With