Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Loading xml data into hive table :org.apache.hadoop.hive.ql.metadata.HiveException

I'm trying to load XML data into Hive but I'm getting an error :

java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"xmldata":""}

The xml file i have used is :

<?xml version="1.0" encoding="UTF-8"?>
<catalog>
<book>
  <id>11</id>
  <genre>Computer</genre>
  <price>44</price>
</book>
<book>
  <id>44</id>
  <genre>Fantasy</genre>
  <price>5</price>
</book>
</catalog>

The hive query i have used is :

1) Create TABLE xmltable(xmldata string) STORED AS TEXTFILE;
LOAD DATA lOCAL INPATH '/home/user/xmlfile.xml' OVERWRITE INTO TABLE xmltable;

2) CREATE VIEW xmlview (id,genre,price)
AS SELECT
xpath(xmldata, '/catalog[1]/book[1]/id'),
xpath(xmldata, '/catalog[1]/book[1]/genre'),
xpath(xmldata, '/catalog[1]/book[1]/price')
FROM xmltable;

3) CREATE TABLE xmlfinal AS SELECT * FROM xmlview;

4) SELECT * FROM xmlfinal WHERE id ='11

Till 2nd query everything is fine but when i executed the 3rd query it's giving me error:

The error is as below:

java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"xmldata":"<?xml version=\"1.0\" encoding=\"UTF-8\"?>"}
    at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:159)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
    at org.apache.hadoop.mapred.Child.main(Child.java:262)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error    while processing row {"xmldata":"<?xml version=\"1.0\" encoding=\"UTF-8\"?>"}
    at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:675)
    at org.apache.hadoop.hive.ql.exec

FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask

So where it's going wrong? Also I'm using the proper xml file.

Thanks, Shree

like image 321
shree11 Avatar asked Jul 07 '14 09:07

shree11


1 Answers

Reason for error :

1) case-1 : (your case) - xml content is being fed to hive as line by line.

input xml:

<?xml version="1.0" encoding="UTF-8"?>
<catalog>
<book>
  <id>11</id>
  <genre>Computer</genre>
  <price>44</price>
</book>
<book>
  <id>44</id>
  <genre>Fantasy</genre>
  <price>5</price>
</book>
</catalog>  

check in hive :

select count(*) from xmltable;  // return 13 rows - means each line in individual row with col xmldata  

Reason for err :

XML is being read as 13 pieces not at unified. so invalid XML

2) case-2 : xml content should be fed to hive as singleString - XpathUDFs works refer syntax : All functions follow the form: xpath_(xml_string, xpath_expression_string).* source

input.xml

<?xml version="1.0" encoding="UTF-8"?><catalog><book><id>11</id><genre>Computer</genre><price>44</price></book><book><id>44</id><genre>Fantasy</genre><price>5</price></book></catalog>

check in hive:

select count(*) from xmltable; // returns 1 row - XML is properly read as complete XML.

Means :

xmldata   = <?xml version="1.0" encoding="UTF-8"?><catalog><book> ...... </catalog>

then apply your xpathUDF like this

select xpath(xmldata, 'xpath_expression_string' ) from xmltable
like image 58
vijay kumar Avatar answered Oct 27 '22 15:10

vijay kumar