Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

JAXB mixed content list contains newline characters

Tags:

java

xml

jaxb

I was hoping that you might be able to help me with a problem that I'm facing regarding JAXB.

I have the following XML file:

<root>
    <prop>
        <field1>
            <value1>v1</value1>
            <value2>v2</value2>
        </field1>
        <field2>
            <value1>v1</value1>
            <value2>v2</value2>
        </field2>
    </prop>
    <prop>
        text
        <field1>
            <value1>v1</value1>
            <value2>v2</value2>
        </field1>
    </prop>
    <prop>
        text
    </prop>
</root>

The XML can have under prop other elements (field1, field2), text or both.

And the following classes:

@XmlAccessorType(XmlAccessType.FIELD)
@XmlRootElement(name = "root")
public class Root {

    protected List<Root.Element> prop;

    @XmlAccessorType(XmlAccessType.FIELD)
    public static class Element {
        @XmlMixed
        protected List<String> content;
        @XmlElement
        public Field1 field1;
        @XmlElement
        public Field2 field2;

        @XmlAccessorType(XmlAccessType.FIELD)
        public static class Field1 {
            @XmlElement
            protected String value1;
            @XmlElement
            protected String value2;
        }

        @XmlAccessorType(XmlAccessType.FIELD)
        public static class Field2 {
            @XmlElement
            protected String value1;
            @XmlElement
            protected String value2;

        }

    }

}

I want to unmarshal the XML in to the above classes. The issue that I'm having is that in the content list I get, besides the text, other characters like newline and tab. To be more specific, based on the above XML, when I try to unmarshal I get:

  • first prop with content like ["\n\t\t", "\n\t\t", "\n\t"] - it should be an empty list
  • second prop with content like ["\n\t\ttext\n\t\t", "\n\t"] - it should be a list with one string
  • third prop with content like ["\n\t\ttext\n\t"] - it should be an empty list

I have already tried to create and a XMLAdapter but it is applied for every element in the list, so if I remove the \n and \t and return null if it is an empty string I still get a list with some strings and some null values.

like image 612
Damian Avatar asked Mar 09 '14 15:03

Damian


1 Answers

Why It's Happening

White space content in an element that has mixed context is treated as significant.

How to Fix It

You could use JAXB with StAX to support this use case. With StAX you can create a filtered XMLStreamReader so that any character strings that only contain white space are not reported as events. Below is an example of how you could implement it.

import javax.xml.bind.*;
import javax.xml.stream.*;
import javax.xml.transform.stream.StreamSource;

public class Demo {

    public static void main(String[] args) throws Exception {
        JAXBContext jc = JAXBContext.newInstance(Root.class);

        XMLInputFactory xif = XMLInputFactory.newFactory();
        XMLStreamReader xsr = xif.createXMLStreamReader(new StreamSource("src/forum22284324/input.xml"));
        xsr = xif.createFilteredReader(xsr, new StreamFilter() {

            @Override
            public boolean accept(XMLStreamReader reader) {
                if(reader.getEventType() == XMLStreamReader.CHARACTERS) {
                    return reader.getText().trim().length() > 0;
                } 
                return true;
            }

        });

        Unmarshaller unmarshaller = jc.createUnmarshaller();
        Root root = (Root) unmarshaller.unmarshal(xsr);
    }

}
like image 70
bdoughan Avatar answered Sep 20 '22 13:09

bdoughan