I need to unescape a xml string containing escaped XML tags:
< > & etc...
I did find some libs that can perform this task, but i'd rather use a single method that can perform this task.
Can someone help?
cheers, Bas Hendriks
In Java, we could always write our own functions to escape XML special characters with its equivalent String literals, but we could also use the Java library “StringEscapeUtils” provided by Apache Commons. This library provides us with a common API that does the XML escaping for us.
Document convertStringToDocument(String xmlStr) : This method will take input as String and then convert it to DOM Document and return it. We will use InputSource and StringReader for this conversion.
StringEscapeUtils.unescapeXml(xml)
(commons-lang, download)
Here's a simple method to unescape XML. It handles the predefined XML entities and decimal numerical entities (&#nnnn;). Modifying it to handle hex entities (&#xhhhh;) should be simple.
public static String unescapeXML( final String xml ) { Pattern xmlEntityRegex = Pattern.compile( "&(#?)([^;]+);" ); //Unfortunately, Matcher requires a StringBuffer instead of a StringBuilder StringBuffer unescapedOutput = new StringBuffer( xml.length() ); Matcher m = xmlEntityRegex.matcher( xml ); Map<String,String> builtinEntities = null; String entity; String hashmark; String ent; int code; while ( m.find() ) { ent = m.group(2); hashmark = m.group(1); if ( (hashmark != null) && (hashmark.length() > 0) ) { code = Integer.parseInt( ent ); entity = Character.toString( (char) code ); } else { //must be a non-numerical entity if ( builtinEntities == null ) { builtinEntities = buildBuiltinXMLEntityMap(); } entity = builtinEntities.get( ent ); if ( entity == null ) { //not a known entity - ignore it entity = "&" + ent + ';'; } } m.appendReplacement( unescapedOutput, entity ); } m.appendTail( unescapedOutput ); return unescapedOutput.toString(); } private static Map<String,String> buildBuiltinXMLEntityMap() { Map<String,String> entities = new HashMap<String,String>(10); entities.put( "lt", "<" ); entities.put( "gt", ">" ); entities.put( "amp", "&" ); entities.put( "apos", "'" ); entities.put( "quot", "\"" ); return entities; }
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With