Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

XMLStreamWriter - Java 8 - writeCharacters - 

Tags:

java

stax

The behavior of this method has changed in Java 8, it seems. I need some quick-fix for my problem.

The problem is what I have some code which writes CR and LF after each XML node named <row>. Now (as we migrated to Java 8), instead of CR and LF the characters &#xd; are written out.

Again, I need a quick fix, I cannot change the StaX implementation or do anything big like that.

    while (reader.hasNext()){
        event = reader.next();
        if (event == XMLStreamConstants.START_ELEMENT){

            if (reader.getLocalName().equals("row")){

                writer.writeCharacters("\r\n"); /// this is my problem now!!! 
                writer.writeStartElement(reader.getLocalName());

                n = reader.getAttributeCount();
                for (int i=0; i<n; i++){
                    name = reader.getAttributeName(i).getLocalPart();
                    value = reader.getAttributeValue(i);

                                    ...
                    }
        }
like image 213
peter.petrov Avatar asked Apr 16 '15 13:04

peter.petrov


People also ask

What is XMLStreamWriter?

The XMLStreamWriter interface in the StAX cursor API lets applications write back to an XML stream or create entirely new streams. XMLStreamWriter has methods that let you: Write well-formed XML. Flush or close the output. Write qualified names.


1 Answers

You need to get access to the underlying writer that is the writer you decorated with the XMLStreamWriter (hopefully if there is one it would be the writer you passed into createXMLStreamWriter()) or you need to temporarily disable escaping which is implementation dependent.

The reason your getting the weird characters is that the XMLStreamWriter has no idea where you are writing these characters so it defaults to XML attribute escaping which is stricter than element (content) escaping. The escaping is also generally based on the CharacterEncoder. My guess is that in older versions of Java it was defaulting to XML element escaping which will not escape white space like newlines or a different character encoding was used. I can see why they fixed this as clearly attribute escaping is the correct way to do it. I also have no idea which XMLStreamWriter or CharacterEncoder your actually using and probably what more likely happened is that the default picked XMLStreamWriter or character encoding implementation changed (you should check in the debugger which one is getting picked).

Regardless if you get access to the underlying writer you can just write the characters directly and they will not be escaped. However make sure the writer you use is the one that is decorated and not one deeper (ie if you have a BufferWriter decorating a FileWriter use the BufferWriter).

For those that don't think writeCharacters does escaping you can look at the code.

EDIT

Apparently after looking at the code you can just call writer.setEscapeCharacters(false) on the default sun impl (unfortunately you probably have to do some casting) before you callwriteCharacters which is probably better than getting the original writer. I did not know about this flag.

EDIT 2

Another possible quick fix if your hopefully using the Sun StaX implementation is to change your system level character encoding and picking an encoding so that the CRLF does not get escaped ideally to whatever it was before JDK upgrade. This is assuming the problem could be your character encoding changed from Windows or ISO to UTF-8 on Java upgrade but I can't be sure since you didn't specify your operating system. If it didn't change on upgrade (ie hopefully you have always defaulted to UTF-8) then disregard this option.

EDIT 3

After doing some testing I'm pretty positive your StaX implementation is not the default Java Sun implementation but probably Woodstox. I haven't tested Woodstox but it appears the library cares quite a bit about whitespace for performance reasons and appears to have different rules if its UTF-8 and ISO (again character encoding).

like image 162
Adam Gent Avatar answered Nov 08 '22 22:11

Adam Gent