Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hive : casting array<string> to array<int> in query

I have two tables :

create table a (
`1` array<string>);

create table b (
`1` array<int>);

and I want to put the table a in table b (table b is empty) :

insert into table b
select * from a;

when doing so I get the following error :

FAILED: SemanticException [Error 10044]: Line 1:18 Cannot insert into
target table because column number/types are different 'b': Cannot
convert column 0 from array<string> to array<int>.

whereas I would not get this error if the fields were only of types string and int.

Is there a way to do the cast with arrays ?

like image 239
Pierre Galland Avatar asked Sep 30 '15 16:09

Pierre Galland


People also ask

How do I cast a query in hive?

Hive CAST(from_datatype as to_datatype) function is used to convert from one data type to another for example to cast String to Integer(int), String to Bigint, String to Decimal, Decimal to Int data types, and many more.

How do I convert a String to a number in hive?

Converting a String column to an Integer column or converting a column from one type to another is quite simple in Hive. Simply use the cast function to cast the type from one to another.

Can we perform casting operation from floating point number to String in hive?

Hive type conversion functions are used to explicitly convert to the required type and format. For example, Hive does not convert DOUBLE to FLOAT, INT to STRING etc.

How do you convert a Boolean to a String in hive?

you cannot cast a string into boolean but you can cast boolean to string (like true -> '1'; false -> '0'). You can also use if clause if you want. Save this answer.


2 Answers

Re-assemble array using explode() and collect_list().

Initial String array example:

hive> select array('1','2','3') string_array;
OK
string_array
["1","2","3"]
Time taken: 1.109 seconds, Fetched: 1 row(s)

Convert array:

hive> select collect_list(cast(array_element as int)) int_array --cast and collect array
       from( select explode(string_array) array_element         --explode array
               from (select array('1','2','3') string_array     --initial array
                    )s 
           )s;

Result:

OK
int_array
[1,2,3]
Time taken: 44.668 seconds, Fetched: 1 row(s)

And if you want to add more columns in your insert+select query then use lateral view [outer]:

select col1, col2, collect_list(cast(array_element as int)) int_array
 from
(
select col1, col2 , array_element         
  from table
       lateral view outer explode(string_array) a as array_element         
)s
group by col1, col2
;
like image 181
leftjoin Avatar answered Oct 16 '22 10:10

leftjoin


Brickhouse jar will do this a lot faster than casting them and collecting it back as a list . Add this jar to a hdfs location.
Use the link below to download the brick house jar

add jar hdfs://hadoop-/pathtojar/brickhouse-0.7.1.jar;   
create temporary function cast_array as 'brickhouse.udf.collect.CastArrayUDF';   
select cast_array(columns, 'int') AS columname from table;  
select cast_array(columns, 'string') AS columname from table
like image 28
Harsha Vardhan Avatar answered Oct 16 '22 11:10

Harsha Vardhan