Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

query hive partitioned table over date/time range

My hive table is partitioned on year, month, day, Hour

Now I want to fetch data from 2014-05-27 to 2014-06-05 How can I do that??

I know one option is create partition on epoch(or yyyy-mm-dd-hh) and in query pass epoch time. Can I do it without loosing date hierarchy??

Table Structure

CREATE TABLE IF NOT EXISTS table1 (col1 int, col2 int)
PARTITIONED BY (year int, month int, day int, hour int) 
STORED AS TEXTFILE;
like image 356
banjara Avatar asked Jun 27 '14 06:06

banjara


2 Answers

This is a similar scenario we face everyday while querying tables in hive. We have partitioned our tables similar to the way you explained and it has helped a lot if querying. This is how we partition:

CREATE TABLE IF NOT EXISTS table1 (col1 int, col2 int)
PARTITIONED BY (year bigint, month bigint, day bigint, hour int) 
STORED AS TEXTFILE;

For partitions we assign values like this:

year = 2014, month = 201409, day = 20140924, hour = 01

This way the querying becomes really simple and you can directly query:

select * from table1 where day >= 20140527 and day < 20140605 

Hope this helps

like image 109
Amar Avatar answered Sep 21 '22 00:09

Amar


you can query like this

  WHERE st_date > '2014-05-27-00' and end_date < '2014-06-05-24' 

should give you desired result because even if it is a sting a it will be compared lexicographically i.e '2014-04-04' will be always greater '2014-04-03'.

I ran it on my sample tables and it works perfectly fine.

like image 37
dpsdce Avatar answered Sep 18 '22 00:09

dpsdce