I'm looking for a general regex construct to match everything in pattern x EXCEPT matches to pattern y. This is hard to explain both completely and concisely...see Material Nonimplication for a formal definition.
For example, match any word character (\w
) EXCEPT 'p'. Note I'm subtracting a small set (the letter 'p') from a larger set (all word characters). I can't just say [^p]
because that doesn't take into account the larger limiting set of only word characters. For this little example, sure, I could manually reconstruct something like [a-oq-zA-OQ-Z0-9_]
, which is a pain but doable. But i'm looking for a more general construct so that at least the large positive set can be a more complex expression. Like match ((?<=(so|me|^))big(com?pl{1,3}ex([pA]t{2}ern)
except when it starts with "My".
Edit: I realize that was a bad example, since excluding stuff at the begginning or end is a situation where negative look-ahead and look-behind expressions work. (Bohemian I still gave you an upvote for illustrating this). So...what about excluding matches that contain "My" somewhere in the middle?...I'm still really looking for a general construct, like a regex equivalent of the following pseudo-sql
select [captures] from [input] where ( input MATCHES [pattern1] AND NOT capture MATCHES [pattern2] )
If there answer is "it does not exist and here is why..." I'd like to know that too.
Edit 2: If I wanted to define my own function to do this it would be something like (here's a C# LINQ version):
public static Match[] RegexMNI(string input, string positivePattern, string negativePattern) { return (from Match m in Regex.Matches(input, positivePattern) where !Regex.IsMatch(m.Value, negativePattern) select m).ToArray(); }
I'm STILL just wondering if there is a native regex construct that could do this.
In Chain Builder, you can use regular expression (regex) operators to match characters in text strings, such as to define patterns for: Mapping transformation rules for a Data Prep connector pipeline. The File Utilities connector's Find, Find and replace, and Split file commands.
?= is a positive lookahead, a type of zero-width assertion. What it's saying is that the captured match must be followed by whatever is within the parentheses but that part isn't captured. Your example means the match needs to be followed by zero or more characters and then a digit (but again that part isn't captured).
This will match any character that is a word and is not a p
:
((?=[^p])\w)
To solve your example, use a negative look-ahead for "My" anywhere in the input, ie (?!.*My)
:
^(?!.*My)((?<=(so|me|^))big(com?pl{1,3}ex([pA]t{2}ern)
Note the anchor to start of input ^
which is required to make it work.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With