I need to split a tag that looks something like "B1/AHU/_1/RoomTemp", "B1/AHU/_1/109/Temp", so with a variable with a variable number of fields. I am interested in getting the last field, or sometimes the last but one. I was disappointed to find that negative indexes do not count from the right and allow me to select the last element of an array in Hive as they do in Python.
select tag,split(tag,'[/]')[ -1] from sensor
I was more surprised when this did not work either:
select tag,split(tag,'[/]')[ size(split(tag,'[\]'))-1 ] from sensor
Both times giving me an error along the lines of this:
FAILED: SemanticException 1:27 Non-constant expressions for array indexes not supported.
Error encountered near token '1'
So any ideas? I am kind of new to Hive. Regex's maybe? Or is there some syntactic sugar I am not aware of?
Split function in Hive Hive providing many string functions to manipulate the strings. Split is one of the string function in Hive that used to split the string depending on the pattern and returns the array of strings.
Lateral view explodes the array data into multiple rows. In other words, lateral view expands the array into rows. When you use a lateral view along with the explode function, you will get the result something like below.
This question is getting a lot of views (over a thousand now), so I think it needs a proper answer. In the event I solved it with this:
select tag,reverse(split(reverse(tag),'[/]')[0]) from sensor
which is not actually stated in the other suggested answers - I got the idea from a suggestion in the comments.
This:
Also note that the second-to-last element can be retrieved by substituting 1 for the 0, and so on for the others.
There is a great library of Hive UDFs here. One of them is LastIndexUDF()
. It's pretty self-explainatory, it retrieves the last element of an array. There are instructions to build and use the jar on the main page. Hope this helps.
This seem to work for me, this returns the last element from the SPLIT array
SELECT SPLIT(INPUT__FILE__NAME,'/')[SIZE(SPLIT(INPUT__FILE__NAME,'/')) -1 ] from test_table limit 10;
After reading the LanguageManual UDF a while, I luckily found the function substring_index
exactly meets your requirement, dosen't need any additional calculations at all.
The manual says:
substring_index(string A, string delim, int count)
returns the substring from string A before count occurrences of the delimiter delim (as of Hive 1.3.0). If count is positive, everything to the left of the final delimiter (counting from the left) is returned. If count is negative, everything to the right of the final delimiter (counting from the right) is returned. Substring_index performs a case-sensitive match when searching for delim. Example: substring_index('www.apache.org', '.', 2) = 'www.apache'.
Use cases:
SELECT SUBSTRING_INDEX('www.mysql.com', '.', 2);
--www.mysql
SELECT SUBSTRING_INDEX('www.mysql.com', '.', -1);
--com
See here for more information.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With