Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular Expression That Contains All Of The Specific Letters In Java

Tags:

java

regex

I have a regular expression, which selects all the words that contains all (not! any) of the specific letters, just works fine on Notepad++.

Regular Expression Pattern;

^(?=.*B)(?=.*T)(?=.*L).+$

Input Text File;

AL
BAL
BAK
LABAT
TAL
LAT
BALAT
LA
AB
LATAB
TAB

And output of the regular expression in notepad++;

LABAT
BALAT
LATAB

As It is useful for Notepad++, I tried the same regular expression on java but it is simply failed.

Here is my test code;

import java.util.regex.Matcher;
import java.util.regex.Pattern;
import com.lev.kelimelik.resource.*;

public class Test {

    public static void main(String[] args) {
        String patternString = "^(?=.*B)(?=.*T)(?=.*L).+$";

        String dictionary = 
                "AL" + "\n"
                +"BAL" + "\n"
                +"BAK" + "\n"
                +"LABAT" + "\n"
                +"TAL" + "\n"
                +"LAT" + "\n"
                +"BALAT" + "\n"
                +"LA" + "\n"
                +"AB" + "\n"
                +"LATAB" + "\n"
                +"TAB" + "\n";

        Pattern p = Pattern.compile(patternString, Pattern.DOTALL);
        Matcher m = p.matcher(dictionary);
        while(m.find())
        {
            System.out.println("Match: " + m.group());
        }
    }

}

The output is errorneous as below;

Match: AL
BAL
BAK
LABAT
TAL
LAT
BALAT
LA
AB
LATAB
TAB

My question is simply, what is the java-compatible version of this regular expression?

like image 913
Levent Divilioglu Avatar asked Dec 01 '25 03:12

Levent Divilioglu


2 Answers

Java-specific answer

In real life, we rarely need to validate lines, and I see that in fact, you just use the input as an array of test data. The most common scenario is reading input line by line and perform checks on it. I agree in Notepad++ it would be a bit different solution, but in Java, a single line should be checked separately.

That said, you should not copy the same approaches on different platforms. What is good in Notepad++ does not have to be good in Java.

I suggest this almost regex-free approach (String#split() still uses it):

String dictionary_str = 
        "AL" + "\n"
        +"BAL" + "\n"
        +"BAK" + "\n"
        +"LABAT" + "\n"
        +"TAL" + "\n"
        +"LAT" + "\n"
        +"BALAT" + "\n"
        +"LA" + "\n"
        +"AB" + "\n"
        +"LATAB" + "\n"
        +"TAB" + "\n";
String[] dictionary = dictionary_str.split("\n"); // Split into lines
for (int i=0; i<dictionary.length; i++)   // Iterate through lines
{
    if(dictionary[i].indexOf("B") > -1 && // There must be B
       dictionary[i].indexOf("T") > -1 && // There must be T
       dictionary[i].indexOf("L") > -1)   // There must be L
    {
        System.out.println("Match: " + dictionary[i]); // No need matching, print the whole line
    }
}

See IDEONE demo

Original regex-based answer

You should not rely on .* ever. This construct causes backtracking issues all the time. In this case, you can easily optimize it with a negated character class and possessive quantifiers:

^(?=[^B]*+B)(?=[^T]*+T)(?=[^L]*+L)

The regex breakdown:

  • ^ - start of string
  • (?=[^B]*+B) - right at the start of the string, check for at least one B presence that may be preceded with 0 or more characters other than B
  • (?=[^T]*+T) - still right at the start of the string, check for at least one T presence that may be preceded with 0 or more characters other than T
  • (?=[^L]*+L)- still right at the start of the string, check for at least one L presence that may be preceded with 0 or more characters other than L

See Java demo:

String patternString = "^(?=[^B]*+B)(?=[^T]*+T)(?=[^L]*+L)";
String[] dictionary = {"AL", "BAL", "BAK", "LABAT", "TAL", "LAT", "BALAT", "LA", "AB", "LATAB", "TAB"};
for (int i=0; i<dictionary.length; i++)
{
    Pattern p = Pattern.compile(patternString);
    Matcher m = p.matcher(dictionary[i]);
    if(m.find())
    {
        System.out.println("Match: " + dictionary[i]);
    }
}

Output:

Match: LABAT
Match: BALAT
Match: LATAB
like image 181
Wiktor Stribiżew Avatar answered Dec 02 '25 17:12

Wiktor Stribiżew


Change your Pattern to:

String patternString = ".*(?=.*B)(?=.*L)(?=.*T).*";

Output

Match: LABAT
Match: BALAT
Match: LATAB
like image 41
Mena Avatar answered Dec 02 '25 16:12

Mena



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!