I have a usecase in which i need to count the distinct number of two fields.
Sample :
x = LOAD 'testdata' using PigStorage('^A') as (a,b,c,d);
y = GROUP x BY a;
z = FOREACH y {
**bc = DISTINCT x.b,x.c;**
dd = DISTINCT x.d;
GENERATE FLATTEN(group) as (a), COUNT(bc), COUNT(dd);
};
You were quite close. The key is to not apply DISTINCT
to two fields, but instead to apply it to a single composite field that you create:
x = LOAD 'testdata' using PigStorage('^A') as (a,b,c,d);
x2 = FOREACH x GENERATE a, TOTUPLE(b,c) AS bc, d
y = GROUP x2 BY a;
z = FOREACH y {
bc = DISTINCT x2.bc;
dd = DISTINCT x2.d;
GENERATE FLATTEN(group) AS (a), COUNT(bc), COUNT(dd);
};
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With