I have a bunch of strings that have various prefixes including "unknown:" I'd really like to filter out all the strings starting with "unknown:" in my Pig script, but it doesn't appear to work.
simpleFilter = FILTER records BY NOT(mystr MATCHES '^unknown');
I've tried a few other permutations of the regex, but it appears that MATCHES
just doesn't work well with NOT. Am I missing something?
Using Pig 0.9.2
It's because the matches
operator operates exactly like Java's String#matches
, i.e. it tries to match the entire String and not just part of it (the prefix in your case). Just update your regular expression to match the the entire string with your specified prefix, like so:
simpleFilter = FILTER records BY NOT(mystr MATCHES '^unknown.*');
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With