In Pig, when I do left join and a row does not have row, the values are NULL:
c = join a by ($0) left, b by ($0);
if
a=((1,10),(2,20))
b=((1,30))
then
c=((1,10,30),(2,20,NULL))
I want to use a default value (say, -1) instead of NULL so that
c=((1,10,30),(2,20,-1))
How do I do that?
If that is impossible, how do I change the 3rd column of c to have the default value instead of NULL?
I am not aware if that can be done within the join statement, but you add add another statement:
d = FOREACH c GENERATE $0, $1, (($2 IS NULL) ? -1 : $2);
I guess it won't trigger an additional MR job.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With