Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java Pattern to match any sequence of characters except a given list

Tags:

java

regex

How do I write a Pattern (Java) to match any sequence of characters except a given list of words?

I need to find if a given code has any text surrounded by tags like besides a given list of words. For example, I want to check if there are any other words besides "one" and "two" surrounded by the tag .

"This is the first tag <span>one</span> and this is the third <span>three</span>"

The pattern should match the above string because the word "three" is surrounded by the tag and is not part of the list of given words ("one", "two").

like image 473
Mario Avatar asked Mar 23 '09 07:03

Mario


People also ask

How do you match a character except?

To match any character except a list of excluded characters, put the excluded charaters between [^ and ] . The caret ^ must immediately follow the [ or else it stands for just itself. The character '. ' (period) is a metacharacter (it sometimes has a special meaning).

Which pattern is used to match any non What character?

The expression \w will match any word character. Word characters include alphanumeric characters ( - , - and - ) and underscores (_). \W matches any non-word character.

Is there pattern matching in Java?

Pattern matching has modified two syntactic elements of the Java language: the instanceof keyword and switch statements. They were both extended with a special kind of patterns called type patterns. There is more to come in the near future.


3 Answers

Look-ahead can do this:

\b(?!your|given|list|of|exclusions)\w+\b

Matches

  • a word boundary (start-of-word)
  • not followed by any of "your", "given", "list", "of", "exclusions"
  • followed by multiple word characters
  • followed by a word boundary (end-of-word)

In effect, this matches any word that is not excluded.

like image 82
Tomalak Avatar answered Oct 11 '22 00:10

Tomalak


This should get you started.

import java.util.regex.*;

// >(?!one<|two<)(\w+)/
// 
// Match the character “>” literally «>»
// Assert that it is impossible to match the regex below starting at this position (negative lookahead) «(?!one|two)»
//    Match either the regular expression below (attempting the next alternative only if this one fails) «one»
//       Match the characters “one<” literally «one»
//    Or match regular expression number 2 below (the entire group fails if this one fails to match) «two»
//       Match the characters “two<” literally «two»
// Match the regular expression below and capture its match into backreference number 1 «(\w+)»
//    Match a single character that is a “word character” (letters, digits, etc.) «\w+»
//       Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
// Match the characters “/” literally «</»
List<String> matchList = new ArrayList<String>();
try {
    Pattern regex = Pattern.compile(">(?!one<|two<)(\\w+)/");
    Matcher regexMatcher = regex.matcher(subjectString);
    while (regexMatcher.find()) {
        matchList.add(regexMatcher.group(1));
    } 
} catch (PatternSyntaxException ex) {
    // Syntax error in the regular expression
}
like image 23
Lieven Keersmaekers Avatar answered Oct 11 '22 01:10

Lieven Keersmaekers


Use this:

if (!Pattern.matches(".*(word1|word2|word3).*", "word1")) {
    System.out.println("We're good.");
};

You're checking that the pattern does not match the string.

like image 21
Sarel Botha Avatar answered Oct 11 '22 00:10

Sarel Botha