Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I turn off validation when parsing well-formed XML using DocumentBuilder.parse?

I'm using Java 6. I want to parse XHTML that I know is well-formed. As such, I don't want to do any validation against DTD's or other schemas referenced in the doc. However, I'm having trouble figuring out how to turn that validation off. I have

    DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
    factory.setValidating(false);
    final DocumentBuilder b = factory.newDocumentBuilder();
    final InputSource s = new InputSource(new StringReader(str));
    org.w3c.dom.Document result = b.parse(s);

But I still get an exception on the last line ...

java.net.SocketException: Unexpected end of file from server
    at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:777)
    at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:640)
    at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:774)
    at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:640)
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1195)
    at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(XMLEntityManager.java:677)
    at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startEntity(XMLEntityManager.java:1315)
    at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startDTDEntity(XMLEntityManager.java:1282)
    at com.sun.org.apache.xerces.internal.impl.XMLDTDScannerImpl.setInputSource(XMLDTDScannerImpl.java:283)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.dispatch(XMLDocumentScannerImpl.java:1194)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.next(XMLDocumentScannerImpl.java:1090)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(XMLDocumentScannerImpl.java:1003)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:648)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:511)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:808)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737)
    at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:119)
    at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:235)
    at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:284)
    at com.myco.myproj.util.XmlUtilities.getStringAsDocument(XmlUtilities.java:130)
    at com.myco.myproj.util.NetUtilities.getUrlAsDocument(NetUtilities.java:30)
    at com.myco.myproj.parsers.impl.AbstractChicagoReaderParser.parsePage(AbstractChicagoReaderParser.java:144)
    at com.myco.myproj.parsers.impl.AbstractChicagoReaderParser.getEvents(AbstractChicagoReaderParser.java:112)
    at com.myco.myproj.parsers.impl.ChicagoReaderParserTest.testParser(ChicagoReaderParserTest.java:29)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
    at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
    at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
    at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
    at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
    at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
    at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
    at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
    at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
    at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
    at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
    at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)

I don't want my parser going to the Internet. How do I disable that? Thanks, - Dave

Edit: Per Traroth's suggestion, I tried the below code, but get the same exception

    DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
    factory.setValidating(false);
    final DocumentBuilder builder = factory.newDocumentBuilder();
    builder.setEntityResolver(new EntityResolver() {
        @Override
            public InputSource resolveEntity(String publicId, String systemId) {
                    return null;
            }
        });
    final InputSource s = new InputSource(new StringReader(str));
    org.w3c.dom.Document result = builder.parse(s);
like image 934
Dave Avatar asked Mar 28 '12 15:03

Dave


People also ask

What is validating XML parser?

Validating an XML document determines whether the structure and content of the document conform to a set of rules. In Enterprise COBOL, the rules are expressed in an XML schema , which is essentially a blueprint for a class of documents.

Does XML parser continue processing even if the data is not well formed?

"A standard XML parser will NEVER accept invalid XML", nor will it accept supposed XML that isn't well formed.

What happens if the parser finds that the document is not valid?

A nonfatal error occurs when an XML document fails a validity constraint. If the parser finds that the document is not valid, then an error event is generated.

What are the two methods of parsing in XML document?

To read and update, create and manipulate an XML document, you will need an XML parser. In PHP there are two major types of XML parsers: Tree-Based Parsers. Event-Based Parsers.


1 Answers

Here is how you create a DocumentBuilder that will ignore ALL external referenced entities, including DTDs:

final DocumentBuilder builder = factory.newDocumentBuilder();
builder.setEntityResolver(new EntityResolver() {
    @Override
        public InputSource resolveEntity(String publicId, String systemId) {
                // it might be a good idea to insert a trace logging here that you are ignoring publicId/systemId
                return new InputSource(new StringReader("")); // Returns a valid dummy source
        }
    });
like image 134
Morten Avatar answered Oct 13 '22 00:10

Morten