Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing xml with DOM, DOCTYPE gets erased

how come dom with java erases doctype when editing xml ?

got this xml file :

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!DOCTYPE map[ <!ELEMENT map (station*) >
                <!ATTLIST station  id   ID    #REQUIRED> ]>
<favoris>
<station id="5">test1</station>
<station id="6">test1</station>
<station id="8">test1</station>
</favoris> 

my function is very basic :

public static void EditStationName(int id, InputStream is, String path, String name) throws ParserConfigurationException, SAXException, IOException, TransformerFactoryConfigurationError, TransformerException{
    DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();

    DocumentBuilder builder = factory.newDocumentBuilder();
    Document dom = builder.parse(is);

    Element e = dom. getElementById(String.valueOf(id));
    e.setTextContent(name);
    // Write the DOM document to the file
    Transformer xformer = TransformerFactory.newInstance().newTransformer();
    FileOutputStream fos = new FileOutputStream(path);
    Result result = new StreamResult(fos);  
    Source source = new DOMSource(dom);


        xformer.setOutputProperty(
                OutputKeys.STANDALONE,"yes"     
                );

    xformer.transform(source, result);
}

it's working but the doctype gets erased ! and I just got the whole document but without the doctype part, which is important for me because it allows me to retrieve by id ! how can we keep the doctype ? why does it erase it? I tried many solution with outputkeys for example or omImpl.createDocumentType but none of these worked...

thank you !

like image 421
KitAndKat Avatar asked Jul 09 '11 19:07

KitAndKat


People also ask

How DOM parses the XML file?

DOM parser parses the entire XML file and creates a DOM object in the memory. It models an XML file in a tree structure for easy traversal and manipulation. In DOM everything in an XML file is a node. The node represents a component of an XML file.

What is DOM explain parsing XML using DOM?

The XML Document Object Model (DOM) class is an in-memory representation of an XML document. The DOM allows you to programmatically read, manipulate, and modify an XML document. The XmlReader class also reads XML; however, it provides non-cached, forward-only, read-only access.

Can we create an XML Document using DOM parser?

In short the basic steps one has to take in order to create an XML File withe a DOM Parser are: Create a DocumentBuilder instance. Create a Document from the above DocumentBuilder . Create the elements you want using the Element class and its appendChild method.


1 Answers

Your input XML is not valid. That should be:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!DOCTYPE favoris [
    <!ELEMENT favoris (station)+>
    <!ELEMENT station (#PCDATA)>
    <!ATTLIST station id ID #REQUIRED>
]>
<favoris>
    <station id="i5">test1</station>
    <station id="i6">test1</station>
    <station id="i8">test1</station>
</favoris>

As @DevNull wrote to be fully valid you can't write <station id="5">test1</station> (however for Java it works anyway even with that issue).


DOCTYPE is erased in output XML document:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<favoris>
    <station id="i5">new value</station>
    <station id="i6">test1</station>
    <station id="i8">test1</station>
</favoris>

I didn't find solution to missing DTD yet, but as workaround you can set external DTD:

xformer.setOutputProperty(OutputKeys.DOCTYPE_SYSTEM, "favoris.dtd");

Result (example) document:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!DOCTYPE favoris SYSTEM "favoris.dtd">
<favoris>
    <station id="i5">new value</station>
    <station id="i6">test1</station>
    <station id="i8">test1</station>
</favoris>

EDIT:

I don't think it's possible to save inline DTD using Transformer class (vide here). If you can't use external DTD reference, then you can DOM Level 3 LSSerializer class instead:

DOMImplementationLS domImplementationLS =
    (DOMImplementationLS) dom.getImplementation().getFeature("LS","3.0");
LSOutput lsOutput = domImplementationLS.createLSOutput();
FileOutputStream outputStream = new FileOutputStream("output.xml");
lsOutput.setByteStream((OutputStream) outputStream);
LSSerializer lsSerializer = domImplementationLS.createLSSerializer();
lsSerializer.write(dom, lsOutput);
outputStream.close();

Output with wanted DTD (I can't see any option to add standalone="yes" using LSSerializer...):

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE favoris [<!ELEMENT favoris (station)+>
<!ELEMENT station (#PCDATA)>
<!ATTLIST station id ID #REQUIRED>
]>
<favoris>
    <station id="i5">new value</station>
    <station id="i6">test1</station>
    <station id="i8">test1</station>
</favoris> 

Another approach is to use Apache Xerces2-J XMLSerializer class:

import org.apache.xml.serialize.OutputFormat;
import org.apache.xml.serialize.XMLSerializer;
...

XMLSerializer serializer = new XMLSerializer();
serializer.setOutputCharStream(new java.io.FileWriter("output.xml"));
OutputFormat format = new OutputFormat();
format.setStandalone(true);
serializer.setOutputFormat(format);
serializer.serialize(dom);

Result:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!DOCTYPE favoris [<!ELEMENT favoris (station)+>
<!ELEMENT station (#PCDATA)>
<!ATTLIST station id ID #REQUIRED>
]>
<favoris>
    <station id="i5">new value</station>
    <station id="i6">test1</station>
    <station id="i8">test1</station>
</favoris>
like image 121
Grzegorz Szpetkowski Avatar answered Oct 21 '22 16:10

Grzegorz Szpetkowski