Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I add row numbers for rows in PIG or HIVE?

I have a problem when adding row numbers using Apache Pig. The problem is that I have a STR_ID column and I want to add a ROW_NUM column for the data in STR_ID, which is the row number of the STR_ID.

For example, here is the input:

STR_ID
------------
3D64B18BC842
BAECEFA8EFB6
346B13E4E240
6D8A9D0249B4
9FD024AA52BA

How do I get the output like:

   STR_ID    |   ROW_NUM
----------------------------
3D64B18BC842 |     1
BAECEFA8EFB6 |     2
346B13E4E240 |     3
6D8A9D0249B4 |     4
9FD024AA52BA |     5

Answers using Pig or Hive are acceptable. Thank you.

like image 809
Breakinen Avatar asked Feb 15 '12 05:02

Breakinen


2 Answers

In Hive:

Query

select str_id,row_number() over() from tabledata;

Output

3D64B18BC842      1
BAECEFA8EFB6      2
346B13E4E240      3
6D8A9D0249B4      4
9FD024AA52BA      5
like image 161
Keshav Pradeep Ramanath Avatar answered Sep 24 '22 18:09

Keshav Pradeep Ramanath


Facebook posted a number of hive UDFs including NumberRows. Depending on your hive version (I believe 0.8) you may need to add an attribute to the class (stateful=true).

like image 40
Steve Severance Avatar answered Sep 25 '22 18:09

Steve Severance