Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

generating an id/counter for foreach in pig latin

Tags:

apache-pig

I want some sort of unique identifier/line_number/counter to be generated/appended in my foreach construct while iterates through the records. Is there a way to accomplish this without writing a UDF?

B = foreach A generate a_unique_id, field1,...etc

How do I get that 'a_unique_id' implemented?

Thanks!

like image 303
pranay Avatar asked Oct 03 '11 15:10

pranay


2 Answers

If you are using pig 0.11 or later then the RANK operator is exactly what you are looking for. E.G.

DUMP A;
(foo,19)
(foo,19)
(foo,7)
(bar,90)
(etc.,0)

B = RANK A ;

DUMP B ;
(1,foo,19)
(2,foo,19)
(3,foo,7)
(4,bar,90)
(5,etc.,0)
like image 64
mr2ert Avatar answered Nov 13 '22 19:11

mr2ert


There is no built-in UUID function in the main Pig distribution or piggybank. Unfortunately, I think your only option is going to be writing a UDF.

There is a standard way of building UUIDs and there is Java code out there you can utilize to build off of for your UDF.

Is there a particular reason why you don't want to write a UDF?

like image 39
Donald Miner Avatar answered Nov 13 '22 21:11

Donald Miner