Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hadoop HIVE - How to query part of rows

Tags:

hive

If i have below table;

table name : mytable
columns : id, name, sex, age, score
row1 : 1,Albert,M,30,70
row2 : 2,Scott,M,34,60
row3 : 3,Amilie,F,29,75
...
row100 : 100,Jim,M,35,80

I want to select them five times.

1st iteration : row1 ~ row20
2nd iteration : row21 ~ row40
...
5th iteration : row81 ~ row100

How can I query Hive? Is there any known query? Below returns all 100 rows.

SELECT * FROM mytable;

But I really want to see only 20 rows each time.

like image 736
Dorr Avatar asked Dec 10 '14 23:12

Dorr


2 Answers

Update on this one. Just in case someone else tries this solution now.

For me it only worked with a brackets after the row number and a new SELECT statement around the query with the where clause as the alias 'rowid' wasn't available in the inner SELECT. Took me a while to figure it out.

SELECT * FROM (
    SELECT *, ROW_NUMBER() OVER(Order by id) as rowid FROM mytable
) t1
WHERE rowid > 0 and rowid <= 20;
like image 113
Luke P Avatar answered Sep 26 '22 08:09

Luke P


It can easily do this by Limit Offset in Myqsl. Hive support limit but not Offset(not 100% sure) But you can limit your output by

SELECT * FROM mytable
LIMIT 20;

it will give only 20 records, but not 20-40;

You can do by ROW_NUMBER in hive

SELECT *,ROW_NUMBER over (Order by id)  as rowid FROM mytable
where rowid > 0 and rowid <=20;

next time you have to change the condition in where clause.

SELECT *,ROW_NUMBER over (Order by id)  as rowid FROM mytable
    where rowid > 20 and rowid <=40;

You can also pass the rowid variable using text file or set the variable;run os command and set out put to hive variable

like image 39
Kishore Avatar answered Sep 24 '22 08:09

Kishore