I'm using Java 6. I want to parse XHTML that I know is well-formed. As such, I don't want to do any validation against DTD's or other schemas referenced in the doc. However, I'm having trouble figuring out how to turn that validation off. I have
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setValidating(false);
final DocumentBuilder b = factory.newDocumentBuilder();
final InputSource s = new InputSource(new StringReader(str));
org.w3c.dom.Document result = b.parse(s);
But I still get an exception on the last line ...
java.net.SocketException: Unexpected end of file from server
at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:777)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:640)
at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:774)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:640)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1195)
at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(XMLEntityManager.java:677)
at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startEntity(XMLEntityManager.java:1315)
at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startDTDEntity(XMLEntityManager.java:1282)
at com.sun.org.apache.xerces.internal.impl.XMLDTDScannerImpl.setInputSource(XMLDTDScannerImpl.java:283)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.dispatch(XMLDocumentScannerImpl.java:1194)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.next(XMLDocumentScannerImpl.java:1090)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(XMLDocumentScannerImpl.java:1003)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:648)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:511)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:808)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:119)
at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:235)
at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:284)
at com.myco.myproj.util.XmlUtilities.getStringAsDocument(XmlUtilities.java:130)
at com.myco.myproj.util.NetUtilities.getUrlAsDocument(NetUtilities.java:30)
at com.myco.myproj.parsers.impl.AbstractChicagoReaderParser.parsePage(AbstractChicagoReaderParser.java:144)
at com.myco.myproj.parsers.impl.AbstractChicagoReaderParser.getEvents(AbstractChicagoReaderParser.java:112)
at com.myco.myproj.parsers.impl.ChicagoReaderParserTest.testParser(ChicagoReaderParserTest.java:29)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
I don't want my parser going to the Internet. How do I disable that? Thanks, - Dave
Edit: Per Traroth's suggestion, I tried the below code, but get the same exception
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setValidating(false);
final DocumentBuilder builder = factory.newDocumentBuilder();
builder.setEntityResolver(new EntityResolver() {
@Override
public InputSource resolveEntity(String publicId, String systemId) {
return null;
}
});
final InputSource s = new InputSource(new StringReader(str));
org.w3c.dom.Document result = builder.parse(s);
Validating an XML document determines whether the structure and content of the document conform to a set of rules. In Enterprise COBOL, the rules are expressed in an XML schema , which is essentially a blueprint for a class of documents.
"A standard XML parser will NEVER accept invalid XML", nor will it accept supposed XML that isn't well formed.
A nonfatal error occurs when an XML document fails a validity constraint. If the parser finds that the document is not valid, then an error event is generated.
To read and update, create and manipulate an XML document, you will need an XML parser. In PHP there are two major types of XML parsers: Tree-Based Parsers. Event-Based Parsers.
Here is how you create a DocumentBuilder that will ignore ALL external referenced entities, including DTDs:
final DocumentBuilder builder = factory.newDocumentBuilder();
builder.setEntityResolver(new EntityResolver() {
@Override
public InputSource resolveEntity(String publicId, String systemId) {
// it might be a good idea to insert a trace logging here that you are ignoring publicId/systemId
return new InputSource(new StringReader("")); // Returns a valid dummy source
}
});
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With