Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generating all fields from an alias after a JOIN in Pig

I would like to perform the equivalent of "keep all a in A where a.field == b.field for some b in B" in Apache Pig. I am implementing it like so,

AB_joined = JOIN A by field, B by field;
A2 = FOREACH AB_joined GENERATE A::field as field, A::field2 as field2, A::field3 as field3;

Enumerating all of A's entries is quite silly, and I would rather do something like,

A2 = FOREACH AB_joined GENERATE flatten(A);

However, this doesn't seem to work. Is there some other way I can do something equivalent without enumerating A's fields?

like image 457
duckworthd Avatar asked May 30 '12 23:05

duckworthd


2 Answers

This should work:

A2 = FOREACH AB_joined GENERATE $0..
like image 181
Sateesh Avatar answered Oct 30 '22 16:10

Sateesh


You can use COGROUP to keep the columns of A separate from columns of B. This is especially useful when A's schema is dynamic and you don't want your code to fail when A's schema changes.

AB = COGROUP A BY field, B BY field;

-- schema of AB will be:
-- {group, A:{all fields of A}, B:{all fields of B}}

A2 = FOREACH AB FLATTEN(A);

Hope this helps.

like image 35
Gaurav Phapale Avatar answered Oct 30 '22 14:10

Gaurav Phapale