Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java regex (?i) vs Pattern.CASE_INSENSITIVE

Tags:

java

regex

I'm using "\\b(\\w+)(\\W+\\1\\b)+" along with input = input.replaceAll(regex, "$1"); to find duplicate words in a string and remove the duplicates. For example the string input = "for for for" would become "for".

However it is failing to turn "Hello hello" into "Hello" even though I have used Pattern p = Pattern.compile(regex, Pattern.CASE_INSENSITIVE);

I can correct it by using "(?i)\\b(\\w+)(\\W+\\1\\b)+" but I want to know why this is necessary? Why do I have to use the (?i) flag when I have already specified Pattern.CASE_INSENSITIVE?

Heres the full code for clarity:

import java.util.Scanner;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class DuplicateWords {

public static void main(String[] args) {

    String regex = "\\b(\\w+)(\\W+\\1\\b)+";
    Pattern p = Pattern.compile(regex, Pattern.CASE_INSENSITIVE);

    Scanner in = new Scanner(System.in);
    int numSentences = Integer.parseInt(in.nextLine());

    while (numSentences-- > 0) {
        String input = in.nextLine();

        Matcher m = p.matcher(input);

        // Check for subsequences of input that match the compiled pattern
        while (m.find()) {
            input = input.replaceAll(regex, "$1");
        }

        // Prints the modified sentence.
        System.out.println(input);
    }
    in.close();
}
}
like image 728
Paddy Avatar asked Jan 04 '17 18:01

Paddy


1 Answers

Your problem is that you're defining a regex with CASE_SENSITIVE flag but not using it correctly in replaceAll method.

You can also use (?i) in the middle of the regex for ignore case match of back-reference \1 like this:

String repl = "Hello hello".replaceAll("\\b(\\w+)(\\W+(?i:\\1)\\b)+", "$1");
//=> Hello

And then use Matcher.replaceAll later.

Working Code:

public class DuplicateWords {

    public static void main(String[] args) {

        String regex = "\\b(\\w+)(\\W+(?i:\\1)\\b)+";
        Pattern p = Pattern.compile(regex);

        // OR this one also works
        // String regex = "\\b(\\w+)(\\W+\\1\\b)+";
        // Pattern p = Pattern.compile(regex, Pattern.CASE_INSENSITIVE);

        Scanner in = new Scanner(System.in);
        int numSentences = Integer.parseInt(in.nextLine());

        while (numSentences-- > 0) {
            String input = in.nextLine();

            Matcher m = p.matcher(input);

            // Check for subsequences of input that match the compiled pattern
            if (m.find()) {
                input = m.replaceAll("$1");
            }

            // Prints the modified sentence.
            System.out.println(input);
        }
        in.close();
    }
}
like image 82
anubhava Avatar answered Sep 20 '22 18:09

anubhava