Use the split() function. You can read about it (and all other Hive functions) in the documentation. You'll want to replace "116:151:1" with the name of the column in your table.
Splitting a string using strtok() in C In C, the strtok() function is used to split a string into a series of tokens based on a particular delimiter.
Use the Split method when the substrings you want are separated by a known delimiting character (or characters). Regular expressions are useful when the string conforms to a fixed pattern. Use the IndexOf and Substring methods in conjunction when you don't want to extract all of the substrings in a string.
Hive has three main functions: data summarization, query and analysis.
There does exist a split function based on regular expressions. It's not listed in the tutorial, but it is listed on the language manual on the wiki:
split(string str, string pat)
Split str around pat (pat is a regular expression)
In your case, the delimiter "|
" has a special meaning as a regular expression, so it should be referred to as "\\|
".
Another interesting usecase for split in Hive is when, for example, a column ipname
in the table has a value "abc11.def.ghft.com" and you want to pull "abc11" out:
SELECT split(ipname,'[\.]')[0] FROM tablename;
Just a clarification on the answer given by Bkkbrad.
I tried this suggestion and it did not work for me.
For example,
split('aa|bb','\\|')
produced:
["","a","a","|","b","b",""]
But,
split('aa|bb','[|]')
produced the desired result:
["aa","bb"]
Including the metacharacter '|' inside the square brackets causes it to be interpreted literally, as intended, rather than as a metacharacter.
For elaboration of this behaviour of regexp, see: http://www.regular-expressions.info/charclass.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With