Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How would I do this in Java Regex?

Tags:

java

regex

Trying to make a regex that grabs all words like lets just say, chicken, that are not in brackets. So like

chicken

Would be selected but

[chicken]

Would not. Does anyone know how to do this?

like image 763
PaulBGD Avatar asked Jan 12 '23 18:01

PaulBGD


2 Answers

String template = "[chicken]";
String pattern = "\\G(?<!\\[)(\\w+)(?!\\])";
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(template);

while (m.find()) 
{
     System.out.println(m.group());
}

It uses a combination of negative look-behind and negative look-aheads and boundary matchers.

(?<!\\[) //negative look behind
(?!\\])  //negative look ahead
(\\w+)   //capture group for the word
\\G      //is a boundary matcher for marking the end of the previous match 

(please read the following edits for clarification)

EDIT 1:
If one needs to account for situations like:

"chicken [chicken] chicken [chicken]"

We can replace the regex with:

String regex = "(?<!\\[)\\b(\\w+)\\b(?!\\])";

EDIT 2:
If one also needs to account for situations like:

"[chicken"
"chicken]"

As in one still wants the "chicken", then you could use:

String pattern = "(?<!\\[)?\\b(\\w+)\\b(?!\\])|(?<!\\[)\\b(\\w+)\\b(?!\\])?";

Which essentially accounts for the two cases of having only one bracket on either side. It accomplishes this through the | which acts as an or, and by using ? after the look-ahead/behinds, where ? means 0 or 1 of the previous expression.

like image 111
Steve P. Avatar answered Jan 18 '23 03:01

Steve P.


I guess you want something like:

final Pattern UNBRACKETED_WORD_PAT = Pattern.compile("(?<!\\[)\\b\\w+\\b(?!])");

private List<String> findAllUnbracketedWords(final String s) {
    final List<String> ret = new ArrayList<String>();
    final Matcher m = UNBRACKETED_WORD_PAT.matcher(s);
    while (m.find()) {
        ret.add(m.group());
    }
    return Collections.unmodifiableList(ret);
}
like image 20
ruakh Avatar answered Jan 18 '23 03:01

ruakh