<p><code>escapeXml</code> function is converting <code>ѭ Ѯ</code> to <code>&amp;#1133; &amp;#1134;</code> which I guess it should not. What I read is that it Supports only the five basic XML entities (<code>gt</code>, <code>lt</code>, <code>quot</code>, <code>amp</code>, <code>apos</code>).</p> <p>Is there a function that only converts these five basic xml entities?</p>

<pre class="prettyprint"><code>public String escapeXml(String s) { return s.replaceAll("&", "&amp;").replaceAll(">", "&gt;").replaceAll("<", "&lt;").replaceAll("\"", "&quot;").replaceAll("'", "&apos;"); } </code></pre>

StringEscapeUtils.escapeXml is converting utf8 characters which it should not

4 Answers

public String escapeXml(String s) {
    return s.replaceAll("&", "&amp;").replaceAll(">", "&gt;").replaceAll("<", "&lt;").replaceAll("\"", "&quot;").replaceAll("'", "&apos;");
}

162

answered Oct 03 '22 02:10

Bombe

The javadoc for the 3.1 version of the library says:

Note that Unicode characters greater than 0x7f are as of 3.0, no longer escaped. If you still wish this functionality, you can achieve it via the following: StringEscapeUtils.ESCAPE_XML.with( NumericEntityEscaper.between(0x7f, Integer.MAX_VALUE) );

So you probably use an older version of the library. Update your dependencies (or reimplement the escape yourself: it's not rocket science)

answered Oct 03 '22 00:10

JB Nizet

The javadoc of StringEscapeUtils.escapeXml says that we have to use

StringEscapeUtils.ESCAPE_XML.with( new UnicodeEscaper(Range.between(0x7f, Integer.MAX_VALUE)) );

But instead of UnicodeEscaper, NumericEntityEscaper has to be used. UnicodeEscaper will change everything to \u1234 symbols, but NumericEntityEscaper escapes as &#123;, that was expected.

package mypackage;

import org.apache.commons.lang3.StringEscapeUtils;
import org.apache.commons.lang3.text.translate.CharSequenceTranslator;
import org.apache.commons.lang3.text.translate.NumericEntityEscaper;

public class XmlEscaper {
    public static void main(final String[] args) {
        final String xmlToEscape = "<hello>Hi</hello>" + "_ _" + "__ __"  + "___ ___" + "after &nbsp;"; // the line cont

        // no Unicode escape
        final String escapedXml = StringEscapeUtils.escapeXml(xmlToEscape);

        // escape Unicode as numeric codes. For instance, escape non-breaking space as &#160;
        final CharSequenceTranslator translator = StringEscapeUtils.ESCAPE_XML.with( NumericEntityEscaper.between(0x7f, Integer.MAX_VALUE) );
        final String escapedXmlWithUnicode = translator.translate(xmlToEscape);

        System.out.println("xmlToEscape: " + xmlToEscape);
        System.out.println("escapedXml: " + escapedXml); // does not escape Unicode characters like non-breaking space
        System.out.println("escapedXml with unicode: " + escapedXmlWithUnicode); // escapes Unicode characters
    }
}

answered Oct 03 '22 01:10

Dmitriy Popov

In times of UTF-8, XML documents having readable characters is sometimes preferred. This should work, and the recomposition of the String only happens once.

import java.util.regex.Matcher;
import java.util.regex.Pattern;

private static final Pattern ESCAPE_XML_CHARS = Pattern.compile("[\"&'<>]");

public static String escapeXml(String s) {
    Matcher m = ESCAPE_XML_CHARS.matcher(s);
    StringBuffer buf = new StringBuffer();
    while (m.find()) {
        switch (m.group().codePointAt(0)) {
            case '"':
                m.appendReplacement(buf, "&quot;");
            break;
            case '&':
                m.appendReplacement(buf, "&amp;");
            break;
            case '\'':
                m.appendReplacement(buf, "&apos;");
            break;
            case '<':
                m.appendReplacement(buf, "&lt;");
            break;
            case '>':
                m.appendReplacement(buf, "&gt;");
            break;
        }
    }
    m.appendTail(buf);
    return buf.toString();
}

answered Oct 03 '22 00:10

Matthias Ronge

Related questions
                            
                                HashMap<String, boolean> copy all the keys into HashMap<String, Integer>and initialize values to zero
                            
                                Regex to get first two words of unknown length from a string
                            
                                Spring hibernate template list as a parameter
                            
                                for-each vs for vs while
                            
                                RAD - JVM debug port is in use
                            
                                Promoting letters in a string to the next letter in java
                            
                                Java Unicode Confusion
                            
                                Can a collection have multiple iterators in Java?
                            
                                Why use parallel arrays in Java?
                            
                                java: advantages of immutable objects in examples [closed]
                            
                                Terminal command to open Safari
                            
                                How do you update a value in a Quartz JobDataMap?
                            
                                how to use TLSV1 or SSLV3 for first handshake(Client Hello) in Java?
                            
                                Catch exceptions in javax.swing application
                            
                                When to use flush() in java?
                            
                                How to make EditText that cannot be editted? [duplicate]
                            
                                Java - Date format with Turkish or other months
                            
                                isAnnotationPresent() return false when used with super type reference in Java
                            
                                Splitting strings based on a delimiter
                            
                                Tricky Java program

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

StringEscapeUtils.escapeXml is converting utf8 characters which it should not

Tags:

java

xml

stringescapeutils

Mady

People also ask