Regex in java for finding duplicate consecutive words

Tags:

java

regex

I saw this as an answer for finding repeated words in a string. But when I use it, it thinks This and is are the same and deletes the is.

Regex

Click to copy

"\\b(\\w+)\\b\\s+\\1"

Any idea why this is happening?

Here is the code that I am using for duplicate removal

Click to copy

public static String RemoveDuplicateWords(String input)
{
    String originalText = input;
    String output = "";
    Pattern p = Pattern.compile("\b(\w+)\b\s+\b\1\b", Pattern.MULTILINE+Pattern.CASE_INSENSITIVE); 
    //Pattern p = Pattern.compile("\\b(\\w+)\\b\\s+\\1", Pattern.MULTILINE+Pattern.CASE_INSENSITIVE);
    Matcher m = p.matcher(input);
    if (!m.find())
        output = "No duplicates found, no changes made to data";
    else
    {
        while (m.find())
        {
            if (output == "")
                output = input.replaceFirst(m.group(), m.group(1));
            else
                output = output.replaceAll(m.group(), m.group(1));
        }
        input = output;
        m = p.matcher(input);
        while (m.find())
        {
            output = "";
            if (output == "")
                output = input.replaceAll(m.group(), m.group(1));
            else
                output = output.replaceAll(m.group(), m.group(1));
        }
    }
    return output;
}

237

asked Feb 05 '12 06:02

user1190265

2 Answers

Try this one:

Click to copy

String pattern = "(?i)\\b([a-z]+)\\b(?:\\s+\\1\\b)+";
Pattern r = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE);

String input = "your string";
Matcher m = r.matcher(input);
while (m.find()) {
    input = input.replaceAll(m.group(), m.group(1));
}
System.out.println(input);

The Java regular expressions are explained very well in the API documentation of the Pattern class. After adding some spaces to indicate the different parts of the regular expression:

Click to copy

"(?i) \\b ([a-z]+) \\b (?: \\s+ \\1 \\b )+"

\b       match a word boundary
[a-z]+   match a word with one or more characters;
         the parentheses capture the word as a group    
\b       match a word boundary
(?:      indicates a non-capturing group (which starts here)
\s+      match one or more white space characters
\1       is a back reference to the first (captured) group;
         so the word is repeated here
\b       match a word boundary
)+       indicates the end of the non-capturing group and
         allows it to occur one or more times

answered Sep 23 '22 01:09

Mina Wissa

you should have used \b(\w+)\b\s+\b\1\b, click here to see the result...

Hope this is what you want...

Update 1

Well well well, the output that you have is

the final string after removing duplicates

Click to copy

import java.util.regex.*;

public class MyDup {
    public static void main (String args[]) {
    String input="This This is text text another another";
    String originalText = input;
    String output = "";
    Pattern p = Pattern.compile("\\b(\\w+)\\b\\s+\\b\\1\\b", Pattern.MULTILINE+Pattern.CASE_INSENSITIVE);
    Matcher m = p.matcher(input);
    System.out.println(m);
    if (!m.find())
        output = "No duplicates found, no changes made to data";
    else
    {
        while (m.find())
        {
            if (output == "") {
                output = input.replaceFirst(m.group(), m.group(1));
            } else {
                output = output.replaceAll(m.group(), m.group(1));
            }
        }
        input = output;
        m = p.matcher(input);
        while (m.find())
        {
            output = "";
            if (output == "") {
                output = input.replaceAll(m.group(), m.group(1));
            } else {
                output = output.replaceAll(m.group(), m.group(1));
            }
        }
    }
    System.out.println("After removing duplicate the final string is " + output);
}

Run this code and see what you get as output... Your queries will be solved...

Note

In output you are replacing duplicate by single word... Isn't it??

When I put System.out.println(m.group() + " : " + m.group(1)); in first if condition I get output as text text : text i.e. duplicates are replacing by single word.

Click to copy

else
    {
        while (m.find())
        {
            if (output == "") {
                System.out.println(m.group() + " : " + m.group(1));
                output = input.replaceFirst(m.group(), m.group(1));
            } else {

Hope you got now what is going on... :)

Good Luck!!! Cheers!!!

answered Sep 22 '22 01:09

Fahim Parkar

Related questions
                            
                                Forward slash or backslash?
                            
                                ExecutorService.invokeAll does NOT support collection of runnable task
                            
                                org.springframework.beans.InvalidPropertyException: Invalid property 'id' of bean class
                            
                                Why Getting NoClassDefFound error for JedisConnection when using Spring Redis
                            
                                How to Convert java to kotlin in handler
                            
                                How can I set the PATH variable for javac so I can manually compile my .java works?
                            
                                How to convert an int[] array to a List?
                            
                                Removing null references from a HashSet
                            
                                Java RESTful services - What is the difference between QueryParam and PathParam in terms of their usage?
                            
                                How can I split a string by two delimiters?
                            
                                Is it possible to view a Java class files bytecode [duplicate]
                            
                                executing block of code atomically
                            
                                java try-with-resource not working with scala
                            
                                cucumber: how to run specific scenario from a feature file
                            
                                Selenium: Scroll to end of page in dynamically loading webpage
                            
                                Optional.get() versus overloaded Optional.orElseThrow()
                            
                                Should I pre-initialize a variable that is overwritten in multiple branches?
                            
                                Formatting a long timestamp into a Date with JSTL
                            
                                Package explorer not showing...packages, in Eclipse. How to fix it?
                            
                                how to pass command line arguments to main method dynamically

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Regex in java for finding duplicate consecutive words

Tags:

java

regex

user1190265

People also ask

2 Answers

Mina Wissa

Update 1

the final string after removing duplicates

Note

Hope you got now what is going on... :)

Good Luck!!! Cheers!!!

Fahim Parkar

Recent Activity

Donate For Us