Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pig - split, lack of default or if/else

Tags:

apache-pig

Since there is no else or default statements in pig split operation what would be the most elegant way to do the following? I'm not a big fan of having to copy paste code.

SPLIT rawish_data
    INTO good_rawish_data IF (
    (uid > 0L) AND
    (value1 > 0) AND
    (value1 < 100) AND
    (value1 IS NOT NULL) AND
    (value2 > 0L) AND
    (value2 < 200L) AND
    (value3 >= 0) AND
    (value3 <= 300)),

    bad_rawish_data IF (NOT (
    (uid > 0L) AND
    (value1 > 0) AND
    (value1 < 100) AND
    (value1 IS NOT NULL) AND
    (value2 > 0L) AND
    (value2 < 200L) AND
    (value3 >= 0) AND
    (value3 <= 300)));

I would like to do something like

SPLIT data
    INTO good_data IF (
    (value > 0)),
    good_data_big_values IF (
    (value > 100)),
    bad_data DEFAULT;

Is anything like this possible in anyway?

like image 946
warbaque Avatar asked Sep 20 '13 09:09

warbaque


2 Answers

It is. Checking out the docs for SPLIT, you want to use OTHERWISE. For example:

SPLIT data
    INTO good_data IF (
    (value > 0)),
    good_data_big_values IF (
    (value > 100)),
    bad_data OTHERWISE;

So you almost got it. :)

NOTE: SPLIT can put a single row into both good_data and good_data_big_values if, for example, value was 150. I don't know if this is what you want, but you should be aware of it regardless. This also means that bad_data will only contain rows where value is 0 or less.

like image 118
mr2ert Avatar answered Sep 28 '22 01:09

mr2ert


You could write an IsGood() UDF where all the conditions are checked. Then your pig is simply

SPLIT data
    INTO good_data IF (IsGood(data))
         good_data_big_values IF (IsGood(data) AND value > 100)),
         bad_data IF (NOT IsGood(data))
;

Another option might be to use a macro

like image 38
Metropolis Avatar answered Sep 28 '22 01:09

Metropolis