Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Apache Hive regEx serde: data types

Tags:

regex

apache

hive

For processing logs I want to use Apache Hive regEx serde but I only found examples that use String as datatype for the columns of the table.

Now my question is: are datebased types and integers and arrays supported or is it just strings?

This example (and others) only uses strings:

CREATE TABLE access_log (
  remote_ip STRING,
  request_date STRING,
  method STRING,
  request STRING,
  protocol STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES  (
"input.regex" = "([^ ]) . . [([^]]+)] \"([^ ]) ([^ ]) ([^ \"])\" *",
"output.format.string" = "%1$s %2$s %3$s %4$s %5$s"
)
STORED AS TEXTFILE

;

like image 495
darkownage Avatar asked Mar 20 '23 19:03

darkownage


2 Answers

  • Refer the code of SERDE : code of RegexSerDe or github - RegexSerDe code, All columns have to be of type STRING. -- from program comment
  • If you want to do some tweak to it, write some custom Serde code(if you are good at java , then proceed ) and add as a custom serde jar like this example csv custom serde
  • If not, let the columns type be STRING only, and when you want to act upon any column use Casting ( cast() function in hive ) in query.

hope this helps :)

like image 75
vijay kumar Avatar answered Apr 05 '23 21:04

vijay kumar


I haven't used the RegexSerDe personally, but I do notice that there are two classes for it: serde/src/java/org/apache/hadoop/hive/serde2/RegexSerDe.java contrib/src/java/org/apache/hadoop/hive/contrib/serde2/RegexSerDe.java

The second one, which you are referring to, does indeed appear to be restricted to strings. The other appears to support primitive types.

For whatever reason I only see the second one referenced in the API docs.

like image 20
ahains Avatar answered Apr 05 '23 21:04

ahains