Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pig Filter out NOT Matches

I have a bunch of strings that have various prefixes including "unknown:" I'd really like to filter out all the strings starting with "unknown:" in my Pig script, but it doesn't appear to work.

simpleFilter = FILTER records BY NOT(mystr MATCHES '^unknown');

I've tried a few other permutations of the regex, but it appears that MATCHES just doesn't work well with NOT. Am I missing something?

Using Pig 0.9.2

like image 573
Newtang Avatar asked Nov 30 '22 21:11

Newtang


1 Answers

It's because the matches operator operates exactly like Java's String#matches, i.e. it tries to match the entire String and not just part of it (the prefix in your case). Just update your regular expression to match the the entire string with your specified prefix, like so:

simpleFilter = FILTER records BY NOT(mystr MATCHES '^unknown.*');
like image 132
jkovacs Avatar answered Feb 28 '23 12:02

jkovacs