Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hive Explode / Lateral View multiple arrays

Tags:

I have a hive table with the following schema:

COOKIE  | PRODUCT_ID | CAT_ID |    QTY     1234123   [1,2,3]    [r,t,null]  [2,1,null] 

How can I normalize the arrays so I get the following result

COOKIE  | PRODUCT_ID | CAT_ID |    QTY  1234123   [1]          [r]         [2]  1234123   [2]          [t]         [1]   1234123   [3]          null       null  

I have tried the following:

select concat_ws('|',visid_high,visid_low) as cookie ,pid ,catid  ,qty from table lateral view explode(productid) ptable as pid lateral view explode(catalogId) ptable2 as catid  lateral view explode(qty) ptable3 as qty 

however the result comes out as a Cartesian product.

like image 926
user2726995 Avatar asked Dec 18 '13 20:12

user2726995


People also ask

What does lateral view explode do?

Description. The LATERAL VIEW clause is used in conjunction with generator functions such as EXPLODE , which will generate a virtual table containing one or more rows. LATERAL VIEW will apply the rows to each original output row.

How do you explode an array column in hive?

Explode() function takes an array as an input and results elements of that array as separate rows. Select explode(column_name) from table_name; In below example, we have column technology as array of string. And if we use explode function on technology column, each value of array is separated into rows.

What is lateral view inline in hive?

A lateral view first applies the UDTF to each row of base table and then joins resulting output rows to the input rows to form a virtual table having the supplied table alias. Version. Prior to Hive 0.6. 0, lateral view did not support the predicate push-down optimization. In Hive 0.5.


2 Answers

I found a very good solution to this problem without using any UDF, posexplode is a very good solution :

SELECT COOKIE , ePRODUCT_ID, eCAT_ID, eQTY FROM TABLE  LATERAL VIEW posexplode(PRODUCT_ID) ePRODUCT_IDAS seqp, ePRODUCT_ID LATERAL VIEW posexplode(CAT_ID) eCAT_ID AS seqc, eCAT_ID LATERAL VIEW posexplode(QTY) eQTY AS seqq, eDateReported WHERE seqp = seqc AND seqc = seqq;
like image 184
Ahmed Abdellatif Avatar answered Oct 12 '22 19:10

Ahmed Abdellatif


You can use the numeric_range and array_index UDFs from Brickhouse ( http://github.com/klout/brickhouse ) to solve this problem. There is an informative blog posting describing in detail over at http://brickhouseconfessions.wordpress.com/2013/03/07/exploding-multiple-arrays-at-the-same-time-with-numeric_range/

Using those UDFs, the query would be something like

select cookie,    array_index( product_id_arr, n ) as product_id,    array_index( catalog_id_arr, n ) as catalog_id,    array_index( qty_id_arr, n ) as qty from table lateral view numeric_range( size( product_id_arr )) n1 as n; 
like image 38
Jerome Banks Avatar answered Oct 12 '22 21:10

Jerome Banks