Consider these as my input files,
Input 1: (File 1)
12,23,14,15,9
1,2,3,4,5
34,17,8
.
.
Input 2: (File 2)
12 Twelve
23 TwentyThree
34 ThirtyFour
.
.
I will be reading each line from "Input 1" file using my PIG script and I would like to get the results as below, based on the "Input 2" file.
Output:
Twelve,TwentyThree,Fourteen,Fifteen,Nine
One,Two,Three,Four,Five
.
.
Is it possible to achieve this without UDF ? Please let me know your suggestions.
Thanks in Advance !
This violates your criteria of 'No UDF' but the UDF is built-in so I suspect it will suffice.
Query:
data1 = LOAD 'file1' AS (val:chararray);
data2 = LOAD 'file2' AS (num:chararray, desc:chararray);
A = RANK data1; /* creates row number*/
B = FOREACH A GENERATE rank_data1, FLATTEN(TOKENIZE(val, ',')) AS num;
C = RANK B; /* used to keep tuple elements sorted in bag*/
D = JOIN C BY num, data2 BY num;
E = FOREACH D GENERATE C::rank_data1 AS rank_1:long
, C::rank_B AS rank_2:long
, data2::desc AS description;
grpd = GROUP E BY rank_1;
F = FOREACH grpd {
sorted = ORDER E BY rank_2;
GENERATE sorted;
};
X = FOREACH F GENERATE FLATTEN(BagToTuple(sorted.description));
DUMP X;
Output:
(Twelve,TwentyThree,Fourteen,Fifteen,Nine)
(One,Two,Three,Four,Five)
(ThirtyFour,Seventeen,Eight)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With