I need to replace all & in a String that isnt part of a HTML entity. So that the String "This & entites >
& <
" will return "This &
entites > & <
"
And I've come up with this regex-pattern: "&[a-zA-Z0-9]{2,7};" which works fine. But I'm not very skilled in regex, and when I test the speed over 100k iterations, it uses double amount of time over a previous used method, that didnt use regex. (But werent working 100% either).
Testcode:
long time = System.currentTimeMillis();
String reg = "&(?!&#?[a-zA-Z0-9]{2,7};)";
String s="a regex test 1 & 2 1&2 and &_gt; - &_lt;"
for (int i = 0; i < 100000; i++) {test=s.replaceAll(reg, "&");}
System.out.println("Finished in:" + (System.currentTimeMillis() - time) + " milliseconds");
So the question would be whether there is some obvious ways of optimize this regex expression for it to be more effective?
Thanks. @RandomCoder_01 actually, no escapes are needed. & is not a special regex character, so no need to escape it.
We can use a backslash to escape characters. We require two backslashes as backslash is itself a character and needs to be escaped. Characters after \\ are escaped. It is generally used to escape characters at the end of the string.
However, backslash is also an escape character in Java literal strings. To make a regular expression from a string literal, you have to escape each of its backslashes. In a string literal '\\\\' can be used to create a regular expression with '\\', which in turn can match '\'.
replaceAll("\\-", "\\-\\");
s.replaceAll(reg, "&")
is compiling the regular expression every time. Compiling the pattern once will provide some increase in performance (~30% in this case).
long time = System.currentTimeMillis();
String reg = "&(?!&#?[a-zA-Z0-9]{2,7};)";
Pattern p = Pattern.compile(reg);
String s="a regex test 1 & 2 1&2 and &_gt; - &_lt;";
for (int i = 0; i < 100000; i++) {
String test = p.matcher(s).replaceAll("&");
}
System.out.println("Finished in:" +
(System.currentTimeMillis() - time) + " milliseconds");
You have to exclude the &
from your look-ahead assertion. So try this regular expression:
&(?!#?[a-zA-Z0-9]{2,7};)
Or to be more precise:
&(?!(?:#(?:[xX][0-9a-fA-F]|[0-9]+)|[a-zA-Z]+);)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With