Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Jaxb unmarshal xml which contains the & <> signs

To parse my XML with JAXB I have already generated the required POJO's and successfully able to parse the XML. But whenever my xml contains '&' '<>' signs it's failing. As per the rule this needs to be changed to '&amp' but the 3PP generating the XML does not follow the rule. Now how can I parse this xml with '& <>' signs.

Note - For Marshalling I found many answers but not working for unmarshalling.

Environment - Java 8

XML Example :

<Customer Info> This is & Customer Info <Customer Info>

Any help would be helpful

like image 683
Souvik Avatar asked Apr 15 '19 06:04

Souvik


People also ask

What is Unmarshal XML?

Unmarshal a root element that is globally declared The JAXBContext instance maintains a mapping of globally declared XML element and type definition names to JAXB mapped classes. The unmarshal method checks if JAXBContext has a mapping from the root element's XML name and/or @xsi:type to a JAXB mapped class.

How do you Unmarshal XML string to Java object using JAXB?

To unmarshal an xml string into a JAXB object, you will need to create an Unmarshaller from the JAXBContext, then call the unmarshal() method with a source/reader and the expected root object.

How do you Unmarshal a list of objects?

public List<T> Unmarshal(List<Entry> entries, Class clazz) { List<T> out = new ArrayList<T>(); T instance; for (Entry e : entries) { try { JAXBContext context = JAXBContext. newInstance(clazz); Unmarshaller unmarsh = context.

How does JAXB read XML?

To read XML, first get the JAXBContext . It is entry point to the JAXB API and provides methods to unmarshal, marshal and validate operations. Now get the Unmarshaller instance from JAXBContext . It's unmarshal() method unmarshal XML data from the specified XML and return the resulting content tree.


1 Answers

JSoup is designed to cope with parsing fairly rough and ready HTML, so works with more generous parsing rules than the normal XML API (e.g. the built-in version of Xerces that comes with the JRE).

It can output XML to a W3C DOM suitable for use in JAXB:

    org.jsoup.nodes.Document soupDoc = Jsoup.parse(unescapedXml, "",
            Parser.xmlParser());
    org.w3c.dom.Document w3cDoc = new W3CDom().fromJsoup(soupDoc);

    JAXBContext jaxbContext = JAXBContext.newInstance(CustInfo.class);
    Unmarshaller jaxbUnmarshaller = jaxbContext.createUnmarshaller();
    CustInfo custInfo = (CustInfo) jaxbUnmarshaller.unmarshal(w3cDoc);

(Annoyingly both JSoup and W3C use Document ).

This seems to cope well with any of '&' '<' or '>' in an XML attribute or body text, though there are bound to be combinations where the lack of escape chars is just too much.

like image 90
df778899 Avatar answered Sep 18 '22 15:09

df778899