Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

HIVE insert overwrite directory with json format

How can I insert overwrite directory with json schema?

There is raw hive avro table; (this is actually has many fields)

tb_test--------
name string
kickname string
-----------------

then I want to save query result into some directory in hdfs by jsonserde.

I tried this.

insert overwrite directory '/json/'
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
WITH SERDEPROPERTIES (
 "stat_name"="$._col0",
 "stat_interval"="$._col1"
)
STORED AS TEXTFILE 
select name, nickname
from tb_test limit 100

But written json in /json/ has _colXX field name instead of the origin field name.

{"_col0":"basic_qv"," _col1":"h"}
{"_col0":"basic_qv","_col1 ":"h"}
{"_col0":"basic_qv","_col1 ":"h"}
{"_col0":"basic_qv"," _col1":"h"}
{"_col0":"basic_qv","_col1 ":"h"}

I expected

{"name":"basic_qv","nickname":"h"}
{"name":"basic_qv","nickname":"h"}
{"name":"basic_qv","nickname":"h"}
{"name":"basic_qv","nickname":"h"}
{"name":"basic_qv","nickname":"h"}

What will help this?

Thanks!!

like image 281
Jihun No Avatar asked Apr 22 '16 13:04

Jihun No


2 Answers

I ran into this today. In case you don't want/can't use UDFs, using this guideline: https://docs.aws.amazon.com/athena/latest/ug/json-serde.html, this worked for me (order of column should be different also using the mapping prefix:

insert overwrite directory '/json/'
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
 "case.insensitive" = "false",
 "mapping._col0" = "stat_name",
 "mapping._col1" = "stat_interval"
)
select name, nickname
from tb_test limit 100
like image 59
Shervin Avatar answered Oct 12 '22 20:10

Shervin


It seems your issue with a workaround (using JsonUDF with named_struct) is described here: https://github.com/rcongiu/Hive-JSON-Serde/issues/151

extract.hql:
add jar /home/myuser/lib/json-udf-1.3.8-SNAPSHOT-jar-with-dependencies.jar;
create temporary function tjson as 'org.openx.data.udf.JsonUDF';

insert overwrite local directory '/json/'
select
tjson(named_struct("name", t.name,"nickname", t.nickname))
from tb_test t
;

Also you can create JsonSerDe-based table with columns defined, insert overwrite it and use table location instead of directory.

like image 20
leftjoin Avatar answered Oct 12 '22 19:10

leftjoin