My hive table is partitioned on year, month, day, Hour
Now I want to fetch data from 2014-05-27 to 2014-06-05 How can I do that??
I know one option is create partition on epoch(or yyyy-mm-dd-hh) and in query pass epoch time. Can I do it without loosing date hierarchy??
Table Structure
CREATE TABLE IF NOT EXISTS table1 (col1 int, col2 int)
PARTITIONED BY (year int, month int, day int, hour int)
STORED AS TEXTFILE;
This is a similar scenario we face everyday while querying tables in hive. We have partitioned our tables similar to the way you explained and it has helped a lot if querying. This is how we partition:
CREATE TABLE IF NOT EXISTS table1 (col1 int, col2 int)
PARTITIONED BY (year bigint, month bigint, day bigint, hour int)
STORED AS TEXTFILE;
For partitions we assign values like this:
year = 2014, month = 201409, day = 20140924, hour = 01
This way the querying becomes really simple and you can directly query:
select * from table1 where day >= 20140527 and day < 20140605
Hope this helps
you can query like this
WHERE st_date > '2014-05-27-00' and end_date < '2014-06-05-24'
should give you desired result because even if it is a sting a it will be compared lexicographically i.e '2014-04-04' will be always greater '2014-04-03'.
I ran it on my sample tables and it works perfectly fine.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With