Currently I'm working on converting HTML codes with equivalent characters in java. I need to convert the below code to characters.
è - è
® - ®
& - &
ñ - ñ
& - &
I tried using the regex pattern
(&#x)([\\d|\\w]*)([\\d|\\w]*)([\\d|\\w]*)([\\d|\\w]*)(;)
When I debug, matcher.find()
gives me true
but the control skips the loop where I have written the code for conversion. Don't know what is happening there.
Also, is there any way to optimize this regex?
Any help is appreciated.
Exception
java.lang.NumberFormatException: For input string: "x26"
at java.lang.NumberFormatException.forInputString(Unknown Source)
at java.lang.Integer.parseInt(Unknown Source)
at java.lang.Integer.parseInt(Unknown Source)
at org.apache.commons.lang.Entities.unescape(Entities.java:683)
at org.apache.commons.lang.StringEscapeUtils.unescapeHtml(StringEscapeUtils.java:483)
unescapeHtml4() for this: Unescapes a string containing entity escapes to a string containing the actual Unicode characters corresponding to the escapes.
In Java, we can use Apache commons-text , StringEscapeUtils. escapeHtml4(str) to escape HTML characters. In the old days, we usually use the Apache commons-lang3 , StringEscapeUtils class to escape HTML, but this class is deprecated as of 3.6.
Also, is there any way to optimize this regex?
Yes, don't use regex for this task, use Apache StringEscapeUtils from Apache commons lang:
import org.apache.commons.lang.StringEscapeUtils;
...
String withCharacters = StringEscapeUtils.unescapeHtml(yourString);
JavaDoc says:
Unescapes a string containing entity escapes to a string containing the actual Unicode characters corresponding to the escapes. Supports HTML 4.0 entities.
For example, the string
"<Français>"
will become"<Français>"
If an entity is unrecognized, it is left alone, and inserted verbatim into the result string. e.g.
">&zzzz;x"
will become">&zzzz;x"
.
One of all the other possibilities or existing util methods could be spring-web's org.springframework.web.util.HtmlUtils.htmlUnescape
.
Example usage in a self-contained Groovy script:
@Grapes(
@Grab(group='org.springframework', module='spring-web', version='4.3.0.RELEASE')
)
import org.springframework.web.util.HtmlUtils
println HtmlUtils.htmlUnescape("La élite del tenis no teme al zika y jugará en Río")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With