Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to insert special characters taken from a string into another string?

Tags:

java

string

regex

I have a string,

    string1 = "Sri Lanka National Chess Championship this year and represented Sri Lanka at represented Sri Lanka Universities at the World University Chess Championships."

And I have another string named 'string2' which only have strings surrounded by '<NOUN> and </NOUN>' tags separated by a space.

string2 = "<NOUN>Sri Lanka National Chess Championship</NOUN> <NOUN>Sri Lanka</NOUN> <NOUN>Sri Lanka</NOUN> <NOUN>World University Chess</NOUN>"

Note that second string can have any no of noun tagged words(based on the 'string1',eg: if string1 has 3 nouns, string2 will have same 3 nouns surrounded by noun tags)
I want to add tags to the 'string1' and make string1 as follows,

string1 = "<NOUN>Sri Lanka National Chess Championship</NOUN> this year and represented <NOUN>Sri Lanka</NOUN> at represented <NOUN>Sri Lanka</NOUN> Universities at the <NOUN>World University Chess</NOUN> Championships."

I used following code to do this,

Pattern p = Pattern.compile("<NOUN>(.*?)</NOUN>");
    Matcher m = p.matcher(string2);
    while(m.find()) {
        string1= string1.replaceAll(m.group(1),m.group(0));
    } 

But it gives me following output,

<NOUN><NOUN><NOUN>Sri Lanka</NOUN></NOUN> National Chess Championship</NOUN> this year and represented <NOUN><NOUN>Sri Lanka</NOUN></NOUN> at represented <NOUN><NOUN>Sri Lanka</NOUN></NOUN> Universities at the <NOUN>World University Chess</NOUN> Championships.

Can anyone please tell me how to do this correctly?
Or please tell me how to get the desired output form the given output?

like image 650
Roshanck Avatar asked Oct 26 '25 07:10

Roshanck


2 Answers

instead of :

string1= string1.replaceAll(m.group(1),m.group(0));

use :

string1= string1.replaceAll("(?<!<NOUN>)("+m.group(1)+")(?!</NOUN>)",m.group(0));

See more about "Look Ahead and Look Behind Constructs" here

like image 161
Grisha Weintraub Avatar answered Oct 28 '25 20:10

Grisha Weintraub


The problem with your example is that Sri Lanka National Chess Championship is a noun and Sri Lanka, a part of this string is also a noun. So, your matcher is replacing strings a multiple times.

You can solve this issue by not replacing the string fragments that have been replaced already. I broke the string into three parts for each match : before, match-str, after. Maintain the order of the broken strings. Vector is a very convenient data-structure for this.

import java.util.Vector;
import java.util.regex.Matcher;
import java.util.regex.Pattern;


public class Check {

static String print(Vector<String> parts) {
    String str = parts.elementAt(0);

    for(int i=1; i<parts.size(); i++) {
        str += parts.elementAt(i); 
        //System.out.print(i + " : " + parts.elementAt(i) + "\n");
    }

    return str;
}

public static void main(String args[]) {
    String string1;
    String string2;
    String expected;

    string1 = "Sri Lanka National Chess Championship this year and represented Sri Lanka at represented Sri Lanka Universities at the World University Chess Championships.";
    string2 = "<NOUN>Sri Lanka National Chess Championship</NOUN> <NOUN>Sri Lanka</NOUN> <NOUN>Sri Lanka</NOUN> <NOUN>World University Chess</NOUN>";
    expected = "<NOUN>Sri Lanka National Chess Championship</NOUN> this year and represented <NOUN>Sri Lanka</NOUN> at represented <NOUN>Sri Lanka</NOUN> Universities at the <NOUN>World University Chess</NOUN> Championships.";


    Pattern p = Pattern.compile("<NOUN>(.*?)</NOUN>");
    Matcher m = p.matcher(string2);
    Vector<String> parts = new Vector<String>();
    parts.add(string1);

    while(m.find()) {
        for(int i=0; i<parts.size(); i++) {

            //search for used part
            if(parts.elementAt(i).indexOf("<NOUN>")!=-1) {
                continue;
            }

            // search for pattern
            String cur = parts.elementAt(i);
            int disp = cur.indexOf(m.group(1));
            if(disp==-1) {
                continue;
            } else {
                parts.remove(i);
                Vector<String> newParts = new Vector<String>();

                if(disp!=0) {
                    newParts.add(cur.substring(0, disp));
                }

                newParts.add(m.group(0));

                if((disp+m.group(1).length())!=cur.length()) {
                    newParts.add(cur.substring(disp+m.group(1).length()));
                }

                if(i!=0) {
                    parts.addAll(i, newParts);
                } else {
                    parts.addAll(newParts);
                }

                //System.out.print(print(parts) + "\n");
            }           
        }
    }

    string1 = print(parts);
    if(!string1.equals(expected)) {
        System.out.println("Unexpected output !!");
    } else {
        System.out.println("Correct !!");
    }
}

};

You can rename the print method to stringify for convenience.

like image 43
prathmesh.kallurkar Avatar answered Oct 28 '25 21:10

prathmesh.kallurkar



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!