How do I write a Pattern (Java) to match any sequence of characters except a given list of words?
I need to find if a given code has any text surrounded by tags like besides a given list of words. For example, I want to check if there are any other words besides "one" and "two" surrounded by the tag .
"This is the first tag <span>one</span> and this is the third <span>three</span>"
The pattern should match the above string because the word "three" is surrounded by the tag and is not part of the list of given words ("one", "two").
To match any character except a list of excluded characters, put the excluded charaters between [^ and ] . The caret ^ must immediately follow the [ or else it stands for just itself. The character '. ' (period) is a metacharacter (it sometimes has a special meaning).
The expression \w will match any word character. Word characters include alphanumeric characters ( - , - and - ) and underscores (_). \W matches any non-word character.
Pattern matching has modified two syntactic elements of the Java language: the instanceof keyword and switch statements. They were both extended with a special kind of patterns called type patterns. There is more to come in the near future.
Look-ahead can do this:
\b(?!your|given|list|of|exclusions)\w+\b
Matches
In effect, this matches any word that is not excluded.
This should get you started.
import java.util.regex.*;
// >(?!one<|two<)(\w+)/
//
// Match the character “>” literally «>»
// Assert that it is impossible to match the regex below starting at this position (negative lookahead) «(?!one|two)»
// Match either the regular expression below (attempting the next alternative only if this one fails) «one»
// Match the characters “one<” literally «one»
// Or match regular expression number 2 below (the entire group fails if this one fails to match) «two»
// Match the characters “two<” literally «two»
// Match the regular expression below and capture its match into backreference number 1 «(\w+)»
// Match a single character that is a “word character” (letters, digits, etc.) «\w+»
// Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
// Match the characters “/” literally «</»
List<String> matchList = new ArrayList<String>();
try {
Pattern regex = Pattern.compile(">(?!one<|two<)(\\w+)/");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
matchList.add(regexMatcher.group(1));
}
} catch (PatternSyntaxException ex) {
// Syntax error in the regular expression
}
Use this:
if (!Pattern.matches(".*(word1|word2|word3).*", "word1")) {
System.out.println("We're good.");
};
You're checking that the pattern does not match the string.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With