Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace a set of substring in a string in more efficient way?

Tags:

java

string

regex

I've to replace a set of substrings in a String with another substrings for example

  1. "^t" with "\t"
  2. "^=" with "\u2014"
  3. "^+" with "\u2013"
  4. "^s" with "\u00A0"
  5. "^?" with "."
  6. "^#" with "\\d"
  7. "^$" with "[a-zA-Z]"

So, I've tried with:

String oppip = "pippo^t^# p^+alt^shefhjkhfjkdgfkagfafdjgbcnbch^";

Map<String,String> tokens = new HashMap<String,String>();
tokens.put("^t", "\t");
tokens.put("^=", "\u2014");
tokens.put("^+", "\u2013");
tokens.put("^s", "\u00A0");
tokens.put("^?", ".");
tokens.put("^#", "\\d");
tokens.put("^$", "[a-zA-Z]");

String regexp = "^t|^=|^+|^s|^?|^#|^$";

StringBuffer sb = new StringBuffer();
Pattern p = Pattern.compile(regexp);
Matcher m = p.matcher(oppip);
while (m.find())
    m.appendReplacement(sb, tokens.get(m.group()));
m.appendTail(sb);
System.out.println(sb.toString()); 

But it doesn't work. tokens.get(m.group()) throws an exception.

Any idea why?

like image 632
Matt3o Avatar asked Dec 20 '22 11:12

Matt3o


2 Answers

You don't have to use a HashMap. Consider using simple arrays, and a loop:

String oppip = "pippo^t^# p^+alt^shefhjkhfjkdgfkagfafdjgbcnbch^";

String[] searchFor =
{"^t", "^=", "^+", "^s", "^?", "^#", "^$"},
         replacement =
{"\\t", "\\u2014", "\\u2013", "\\u00A0", ".", "\\d", "[a-zA-Z]"};

for (int i = 0; i < searchFor.length; i++)
    oppip = oppip.replace(searchFor[i], replacement[i]);

// Print the result.
System.out.println(oppip);

Here is an online code demo.


For the completeness, you can use a two-dimensional array for a similar approach:

String oppip = "pippo^t^# p^+alt^shefhjkhfjkdgfkagfafdjgbcnbch^";

String[][] tasks =
{
    {"^t", "\\t"},
    {"^=", "\\u2014"}, 
    {"^+", "\\u2013"}, 
    {"^s", "\\u00A0"}, 
    {"^?", "."}, 
    {"^#", "\\d"}, 
    {"^$", "[a-zA-Z]"}
};

for (String[] replacement : tasks)
    oppip = oppip.replace(replacement[0], replacement[1]);

// Print the result.
System.out.println(oppip);
like image 97
Unihedron Avatar answered Jan 31 '23 08:01

Unihedron


In regex the ^ means "begin-of-text" (or "not" within a character class as negation). You have to place a backslash before it, which becomes two backslashes in a java String.

String regexp = "\\^[t=+s?#$]";

I have reduced it a bit further.

like image 43
Joop Eggen Avatar answered Jan 31 '23 10:01

Joop Eggen