I'm trying JSON-SerDe from below link http://code.google.com/p/hive-json-serde/wiki/GettingStarted.
CREATE TABLE my_table (field1 string, field2 int,
field3 string, field4 double)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.JsonSerde' ;
I've added Json-SerDe jar as
ADD JAR /path-to/hive-json-serde.jar;
And loaded data as
LOAD DATA LOCAL INPATH '/home/hduser/pradi/Test.json' INTO TABLE my_table;
and it loads data successfully.
But when query data as
Select * from my_table;
I get only one row from table as
data1 100 more data1 123.001
Test.json contains
{"field1":"data1","field2":100,"field3":"more data1","field4":123.001}
{"field1":"data2","field2":200,"field3":"more data2","field4":123.002}
{"field1":"data3","field2":300,"field3":"more data3","field4":123.003}
{"field1":"data4","field2":400,"field3":"more data4","field4":123.004}
Where is the problem? why only one row is coming instead of 4 rows when i query the table. And in /user/hive/warehouse/my_table contains all the 4 rows!!
hive> add jar /home/hduser/pradeep/hive-json-serde-0.2.jar;
Added /home/hduser/pradeep/hive-json-serde-0.2.jar to class path
Added resource: /home/hduser/pradeep/hive-json-serde-0.2.jar
hive> CREATE EXTERNAL TABLE my_table (field1 string, field2 int,
> field3 string, field4 double)
> ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.JsonSerde'
> WITH SERDEPROPERTIES (
> "field1"="$.field1",
> "field2"="$.field2",
> "field3"="$.field3",
> "field4"="$.field4"
> );
OK
Time taken: 0.088 seconds
hive> LOAD DATA LOCAL INPATH '/home/hduser/pradi/test.json' INTO TABLE my_table;
Copying data from file:/home/hduser/pradi/test.json
Copying file: file:/home/hduser/pradi/test.json
Loading data to table default.my_table
OK
Time taken: 0.426 seconds
hive> select * from my_table;
OK
data1 100 more data1 123.001
Time taken: 0.17 seconds
I've already posted the contents of test.json file. so you can see that query is resulting only one line as
data1 100 more data1 123.001
I've changed the json file to employee.json which contains
{ "firstName" : "Mike", "lastName" : "Chepesky", "employeeNumber" : 1840192 }
and changed table also but it showing a null values when i query the table
hive> add jar /home/hduser/pradi/hive-json-serde-0.2.jar;
Added /home/hduser/pradi/hive-json-serde-0.2.jar to class path
Added resource: /home/hduser/pradi/hive-json-serde-0.2.jar
hive> create EXTERNAL table employees_json (firstName string, lastName string, employeeNumber int )
> ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.JsonSerde';
OK
Time taken: 0.297 seconds
hive> load data local inpath '/home/hduser/pradi/employees.json' into table employees_json;
Copying data from file:/home/hduser/pradi/employees.json
Copying file: file:/home/hduser/pradi/employees.json
Loading data to table default.employees_json
OK
Time taken: 0.293 seconds
hive>select * from employees_json;
OK
NULL NULL NULL
NULL NULL NULL
NULL NULL NULL
NULL NULL NULL
NULL NULL NULL
NULL NULL NULL
Time taken: 0.194 seconds
A bit hard to tell what's going on without the logs (see Getting Started) in case of doubt. Just a quick thought - can you try if it works with WITH SERDEPROPERTIES
as so:
CREATE EXTERNAL TABLE my_table (field1 string, field2 int,
field3 string, field4 double)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.JsonSerde'
WITH SERDEPROPERTIES (
"field1"="$.field1",
"field2"="$.field2",
"field3"="$.field3",
"field4"="$.field4"
);
There is also a fork you might want to give a try from ThinkBigAnalytics.
UPDATE: Turns out the input in Test.json is invalid JSON hence the records get collapsed.
See answer https://stackoverflow.com/a/11707993/396567 for further details.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With