Regular Expression That Contains All Of The Specific Letters In Java

Question

I have a regular expression, which selects all the words that contains all (not! any) of the specific letters, just works fine on Notepad++.

Regular Expression Pattern;

^(?=.*B)(?=.*T)(?=.*L).+$

Input Text File;

AL
BAL
BAK
LABAT
TAL
LAT
BALAT
LA
AB
LATAB
TAB

And output of the regular expression in notepad++;

LABAT
BALAT
LATAB

As It is useful for Notepad++, I tried the same regular expression on java but it is simply failed.

Here is my test code;

import java.util.regex.Matcher;
import java.util.regex.Pattern;
import com.lev.kelimelik.resource.*;

public class Test {

    public static void main(String[] args) {
        String patternString = "^(?=.*B)(?=.*T)(?=.*L).+$";

        String dictionary = 
                "AL" + "
"
                +"BAL" + "
"
                +"BAK" + "
"
                +"LABAT" + "
"
                +"TAL" + "
"
                +"LAT" + "
"
                +"BALAT" + "
"
                +"LA" + "
"
                +"AB" + "
"
                +"LATAB" + "
"
                +"TAB" + "
";

        Pattern p = Pattern.compile(patternString, Pattern.DOTALL);
        Matcher m = p.matcher(dictionary);
        while(m.find())
        {
            System.out.println("Match: " + m.group());
        }
    }

}

The output is errorneous as below;

Match: AL
BAL
BAK
LABAT
TAL
LAT
BALAT
LA
AB
LATAB
TAB

My question is simply, what is the java-compatible version of this regular expression?

Wiktor Stribiżew · Accepted Answer

Java-specific answer

In real life, we rarely need to validate lines, and I see that in fact, you just use the input as an array of test data. The most common scenario is reading input line by line and perform checks on it. I agree in Notepad++ it would be a bit different solution, but in Java, a single line should be checked separately.

That said, you should not copy the same approaches on different platforms. What is good in Notepad++ does not have to be good in Java.

I suggest this almost regex-free approach (String#split() still uses it):

String dictionary_str = 
        "AL" + "
"
        +"BAL" + "
"
        +"BAK" + "
"
        +"LABAT" + "
"
        +"TAL" + "
"
        +"LAT" + "
"
        +"BALAT" + "
"
        +"LA" + "
"
        +"AB" + "
"
        +"LATAB" + "
"
        +"TAB" + "
";
String[] dictionary = dictionary_str.split("
"); // Split into lines
for (int i=0; i<dictionary.length; i++)   // Iterate through lines
{
    if(dictionary[i].indexOf("B") > -1 && // There must be B
       dictionary[i].indexOf("T") > -1 && // There must be T
       dictionary[i].indexOf("L") > -1)   // There must be L
    {
        System.out.println("Match: " + dictionary[i]); // No need matching, print the whole line
    }
}

See IDEONE demo

Original regex-based answer

You should not rely on .* ever. This construct causes backtracking issues all the time. In this case, you can easily optimize it with a negated character class and possessive quantifiers:

^(?=[^B]*+B)(?=[^T]*+T)(?=[^L]*+L)

The regex breakdown:

^ - start of string
(?=[^B]*+B) - right at the start of the string, check for at least one B presence that may be preceded with 0 or more characters other than B
(?=[^T]*+T) - still right at the start of the string, check for at least one T presence that may be preceded with 0 or more characters other than T
(?=[^L]*+L)- still right at the start of the string, check for at least one L presence that may be preceded with 0 or more characters other than L

See Java demo:

String patternString = "^(?=[^B]*+B)(?=[^T]*+T)(?=[^L]*+L)";
String[] dictionary = {"AL", "BAL", "BAK", "LABAT", "TAL", "LAT", "BALAT", "LA", "AB", "LATAB", "TAB"};
for (int i=0; i<dictionary.length; i++)
{
    Pattern p = Pattern.compile(patternString);
    Matcher m = p.matcher(dictionary[i]);
    if(m.find())
    {
        System.out.println("Match: " + dictionary[i]);
    }
}

Output:

Match: LABAT
Match: BALAT
Match: LATAB

Mena · Answer

Change your Pattern to:

String patternString = ".*(?=.*B)(?=.*L)(?=.*T).*";

Output

Match: LABAT
Match: BALAT
Match: LATAB

Regular Expression That Contains All Of The Specific Letters In Java

Tags:

java

regex

Levent Divilioglu

2 Answers

Java-specific answer

Original regex-based answer

Wiktor Stribiżew

Mena

Recent Activity

Donate For Us

Regular Expression That Contains All Of The Specific Letters In Java

Tags:

java

regex

Levent Divilioglu

2 Answers

Java-specific answer

Original regex-based answer

Wiktor Stribiżew

Mena

Related questions

Recent Activity

Donate For Us