Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

XMLStreamWriter outputs invalid characters (does not encode formfeed)

Tags:

java

xml

I'm using XMLOutputFactory with the default Java implementation, when the text that is output has formfeed, it produces an invalid XML file. Apparently, the formfeed character must be escaped, but the XML writer does not escape it. (Perhaps there are other characters that are supposed to be escaped, as well, that are not being escaped).

Is this a bug? Is there a workaround, or is there a parameter I can provide to the XML writer to change the behavior?

The text I am writing may have formfeeds, I want to output it into the XML, and be able to read it later.

Here's my sample code, the \f is the formfeed, both are written exactly as ASCII 12 (form feed) without being escaped. When I feed the output to the XML parser, I get an error trying to read the formfeed, "An invalid XML character (Unicode: 0xc) was found".

public static void main(String[] args) throws XMLStreamException, FileNotFoundException, Exception {
    XMLOutputFactory factory = XMLOutputFactory.newInstance();

    try {
        XMLStreamWriter writer = factory.createXMLStreamWriter(
                new java.io.FileWriter("d:/xyz/ImportXml/out1.xml"));

        writer.writeStartDocument();
        writer.writeCharacters("\n");
        writer.writeStartElement("document");
        writer.writeCharacters("\n");
        writer.writeCharacters("some text character value \"of the\" field & more text \f in <brackets> here.");
        writer.writeCharacters("\n");
        writer.writeStartElement("data");
        writer.writeAttribute("name", "value \"of the\" field & more text \f in <brackets> here.");
        writer.writeEndElement();
        writer.writeCharacters("\n");
        writer.writeEndElement();
        writer.writeCharacters("\n");
        writer.writeEndDocument();

        writer.flush();
        writer.close();

    } catch (XMLStreamException e) {
        e.printStackTrace();
    } catch (java.io.IOException e) {
        e.printStackTrace();
    }
}
like image 792
Mary Avatar asked Mar 31 '26 22:03

Mary


1 Answers

Not a bug. It's a feature. You can add characters verification or do own implementation of XMLStreamWriter interface.

Oracle document http://docs.oracle.com/javase/7/docs/api/javax/xml/stream/XMLStreamWriter.html says:

The XMLStreamWriter does not perform well formedness checking on its input. However the writeCharacters method is required to escape & , < and > For attribute values the writeAttribute method will escape the above characters plus " to ensure that all character content and attribute values are well formed.

Correspond to http://www.w3.org/TR/xml11/#charsets Restricted chars for xml are [#x1-#x8], [#xB-#xC], [#xE-#x1F], [#x7F-#x84], [#x86-#x9F]

"\f" is char with code #x0C.

like image 87
Vitalii Pro Avatar answered Apr 02 '26 13:04

Vitalii Pro