I need to split a tag that looks something like "B1/AHU/_1/RoomTemp", "B1/AHU/_1/109/Temp", so with a variable with a variable number of fields. I am interested in getting the last field, or sometimes the last but one. I was disappointed to find that negative indexes do not count from the right and allow me to select the last element of an array in Hive as they do in Python. <pre class="prettyprint"><code>select tag,split(tag,'[/]')[ -1] from sensor </code></pre> I was more surprised when this did not work either: <pre class="prettyprint"><code>select tag,split(tag,'[/]')[ size(split(tag,'[\]'))-1 ] from sensor </code></pre> Both times giving me an error along the lines of this: <pre class="prettyprint"><code>FAILED: SemanticException 1:27 Non-constant expressions for array indexes not supported. Error encountered near token '1' </code></pre> So any ideas? I am kind of new to Hive. Regex's maybe? Or is there some syntactic sugar I am not aware of?

This question is getting a lot of views (over a thousand now), so I think it needs a proper answer. In the event I solved it with this: <pre class="prettyprint"><code>select tag,reverse(split(reverse(tag),'[/]')[0]) from sensor </code></pre> which is not actually stated in the other suggested answers - I got the idea from a suggestion in the comments. This: <ul> <li>reverses the string (so "abcd/efgh" is now "hgfe/dcba")</li> <li>splits it on "/" into an array (so we have "hgfe" and "dcba")</li> <li>extracts the first element (which is "hgfe")</li> <li>then finally re-reverses (giving us the desired "efgh")</li> </ul> Also note that the second-to-last element can be retrieved by substituting 1 for the 0, and so on for the others.

This seem to work for me, this returns the last element from the SPLIT array <pre class="prettyprint"><code>SELECT SPLIT(INPUT__FILE__NAME,'/')[SIZE(SPLIT(INPUT__FILE__NAME,'/')) -1 ] from test_table limit 10; </code></pre>

Assessing from the end of a split array in Hive

Tags:

hadoop

hive

I need to split a tag that looks something like "B1/AHU/_1/RoomTemp", "B1/AHU/_1/109/Temp", so with a variable with a variable number of fields. I am interested in getting the last field, or sometimes the last but one. I was disappointed to find that negative indexes do not count from the right and allow me to select the last element of an array in Hive as they do in Python.

select tag,split(tag,'[/]')[ -1] from sensor

I was more surprised when this did not work either:

select tag,split(tag,'[/]')[ size(split(tag,'[\]'))-1 ] from sensor

Both times giving me an error along the lines of this:

FAILED: SemanticException 1:27 Non-constant expressions for array indexes not supported. 
Error encountered near token '1'

So any ideas? I am kind of new to Hive. Regex's maybe? Or is there some syntactic sugar I am not aware of?

426

asked Sep 05 '15 16:09

Mike Wise

4 Answers

This question is getting a lot of views (over a thousand now), so I think it needs a proper answer. In the event I solved it with this:

select tag,reverse(split(reverse(tag),'[/]')[0]) from sensor

which is not actually stated in the other suggested answers - I got the idea from a suggestion in the comments.

This:

reverses the string (so "abcd/efgh" is now "hgfe/dcba")
splits it on "/" into an array (so we have "hgfe" and "dcba")
extracts the first element (which is "hgfe")
then finally re-reverses (giving us the desired "efgh")

Also note that the second-to-last element can be retrieved by substituting 1 for the 0, and so on for the others.

144

answered Oct 14 '22 04:10

Mike Wise

There is a great library of Hive UDFs here. One of them is LastIndexUDF(). It's pretty self-explainatory, it retrieves the last element of an array. There are instructions to build and use the jar on the main page. Hope this helps.

answered Oct 14 '22 03:10

o-90

This seem to work for me, this returns the last element from the SPLIT array

SELECT SPLIT(INPUT__FILE__NAME,'/')[SIZE(SPLIT(INPUT__FILE__NAME,'/')) -1 ] from test_table limit 10;

answered Oct 14 '22 04:10

dedricF

After reading the LanguageManual UDF a while, I luckily found the function substring_index exactly meets your requirement, dosen't need any additional calculations at all.

The manual says:

substring_index(string A, string delim, int count) returns the substring from string A before count occurrences of the delimiter delim (as of Hive 1.3.0). If count is positive, everything to the left of the final delimiter (counting from the left) is returned. If count is negative, everything to the right of the final delimiter (counting from the right) is returned. Substring_index performs a case-sensitive match when searching for delim. Example: substring_index('www.apache.org', '.', 2) = 'www.apache'.

Use cases:

SELECT SUBSTRING_INDEX('www.mysql.com', '.', 2);
--www.mysql

SELECT SUBSTRING_INDEX('www.mysql.com', '.', -1);
--com

See here for more information.

answered Oct 14 '22 04:10

henry zhu

Related questions
                            
                                Splitting a tuple into multiple tuples in Pig
                            
                                how to set classpath for a Java program on hadoop file system
                            
                                How to resolve java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 2?
                            
                                Hive - get column names
                            
                                Hive (Finding min of n columns in a row)
                            
                                How recursively use a directory structure in the new Hadoop API?
                            
                                Spark Shell stuck in YARN Accepted state
                            
                                List folder and files of HDFS using JAVA
                            
                                In Nifi, what is the difference between FirstInFirstOutPrioritizer and OldestFlowFileFirstPrioritizer
                            
                                spark select and add columns with alias
                            
                                Splitting input into substrings in PIG (Hadoop)
                            
                                Video Tutorial for Hadoop [closed]
                            
                                what is best HBase client API for java [closed]
                            
                                Cassandra and MapReduce - minimal setup requirements
                            
                                HBase HDFS zookeeper
                            
                                HIVE nested ARRAY in MAP data type
                            
                                Sqoop import Null string
                            
                                Cloudera Hadoop Class file for org.apache.hadoop.classification.InterfaceAudience not found
                            
                                Concat single column fields using GROUP BY
                            
                                Differences between MapReduce and Yarn

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Assessing from the end of a split array in Hive

Tags:

hadoop

hive

Mike Wise

People also ask

4 Answers

Mike Wise

o-90

dedricF

henry zhu

Recent Activity

Donate For Us