Suppose my table looks something like:
Col1 Col2 Col3.....Col20 Col21
Now I want to select all but Col21. I want to change it to unix_timestamp() before I insert into some other table. So the trivial approach is to do something like:
INSERT INTO newtable partition(Col21)
SELECT Col1, Col2, Col3.....Col20, unix_timestamp() AS Col21
FROM oldTable
Is there a way I can achieve this in hive? Thanks a lot for your help!
The easiest way to select specific columns in the Hive query is by specifying the column name in the select statement. SELECT col1, col3, col4 .... FROM Table1; But imagine your table contains many columns (i.e : more than 100 columns) and you need to only exclude a few columns in the select statement.
You can use different mathematical functions , collection functions, type conversion functions, date functions, conditional functions or string functions. In order to limit the number of rows given in result, you can use the LIMIT keyword.
In SQL, MINUS is also called EXCEPT. The MINUS operator finds the difference between two tables or sub-queries and return results from only first SELECT statement. Apache Hive does not support MINUS set operator.
Try to setup the below property
set hive.support.quoted.identifiers=none;
Then select all columns except col_21:
select `(col_21)?+.+` from <table_name>;
For more info refer to this link.
Then insert statement will be
insert into <tablename> partition (col21)
select `(col_21)?+.+` from ( --select all columns from subquery except col21
select *, unix_timestamp() AS alias_col21 from table_name --select *, create new col based on col21
)a;
By using this approach you are going to have alias_col21 as last column in your select statement so that you can partition based on that column.
In Case of joins:
We cannot refer individual columns((t1.id)?+.+
..etc) from each table, so drop the unnecessary columns in select statement.
hive>insert into <tablename> partition (col21)
select * from (
select t1.* from
(--drop col21 and create new alias_col21 by using col21
select `(col21)?+.+`, unix_timestamp() AS alias_col21 from table1
) t1
join table2 t2
on t1.<col-name>=t2.<col-name>)a;
In case you want to drop multiple columns on which you are joining
select
tb1.*,
tb2.`(col1|col2)?+.+`
from
tb1 left join tb2 on
tb1.col1 = tb2.col1
and tb1.col2 = tb2.col2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With