Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hive How to select all but one column?

Tags:

hive

hiveql

Suppose my table looks something like:

Col1 Col2 Col3.....Col20 Col21

Now I want to select all but Col21. I want to change it to unix_timestamp() before I insert into some other table. So the trivial approach is to do something like:

INSERT INTO newtable partition(Col21) 
SELECT Col1, Col2, Col3.....Col20, unix_timestamp() AS Col21
FROM oldTable

Is there a way I can achieve this in hive? Thanks a lot for your help!

like image 885
Rocking chief Avatar asked Jul 08 '18 00:07

Rocking chief


People also ask

How do I SELECT all columns except one column in Hive?

The easiest way to select specific columns in the Hive query is by specifying the column name in the select statement. SELECT col1, col3, col4 .... FROM Table1; But imagine your table contains many columns (i.e : more than 100 columns) and you need to only exclude a few columns in the select statement.

How do I SELECT only few rows in Hive?

You can use different mathematical functions , collection functions, type conversion functions, date functions, conditional functions or string functions. In order to limit the number of rows given in result, you can use the LIMIT keyword.

Can we use except in Hive?

In SQL, MINUS is also called EXCEPT. The MINUS operator finds the difference between two tables or sub-queries and return results from only first SELECT statement. Apache Hive does not support MINUS set operator.


2 Answers

Try to setup the below property

set hive.support.quoted.identifiers=none;

Then select all columns except col_21:

select `(col_21)?+.+` from <table_name>; 

For more info refer to this link.

Then insert statement will be

insert into <tablename> partition (col21) 
select `(col_21)?+.+` from ( --select all columns from subquery except col21
select *, unix_timestamp() AS alias_col21 from table_name --select *, create new col based on col21
)a;

By using this approach you are going to have alias_col21 as last column in your select statement so that you can partition based on that column.

In Case of joins:

We cannot refer individual columns((t1.id)?+.+..etc) from each table, so drop the unnecessary columns in select statement.

hive>insert into <tablename> partition (col21)
select * from (
       select t1.* from
         (--drop col21 and create new alias_col21 by using col21
          select `(col21)?+.+`, unix_timestamp() AS alias_col21 from table1
         ) t1 
    join table2 t2 
  on t1.<col-name>=t2.<col-name>)a;
like image 172
notNull Avatar answered Sep 21 '22 09:09

notNull


In case you want to drop multiple columns on which you are joining

select
    tb1.*,
    tb2.`(col1|col2)?+.+`
from
     tb1 left join tb2 on
    tb1.col1 = tb2.col1
    and tb1.col2 = tb2.col2
like image 41
Sanjay Sadasivan Avatar answered Sep 20 '22 09:09

Sanjay Sadasivan