Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

.NET: Prevent XmlDocument.LoadXml from retrieving DTD

Tags:

.net

xml

dtd

I have following code (C#), it takes too long and it throws exception:

new XmlDocument().
LoadXml("<?xml version='1.0' ?><!DOCTYPE note SYSTEM 'http://someserver/dtd'><note></note>");

I understand why it does that. My question is how do I make it stop? I don't care about DTD validation. I suppose I could just regex-replace it, but I am looking for more elegant solution.

Background:
The actual XML is received from a web site I do not own. When site is undergoing maintenance it returns XML with DOCTYPE that points to the DTD that's not available during maintenance. So my service gets unnecessary slow because it tries to get DTD for each XML I need to parse.

Here is exception stack:

Unhandled Exception: System.Net.WebException: The remote name could not be resolved: 'someserver'
at System.Net.HttpWebRequest.GetResponse()
at System.Xml.XmlDownloadManager.GetNonFileStream(Uri uri, ICredentials credentials)
at System.Xml.XmlDownloadManager.GetStream(Uri uri, ICredentials credentials)
at System.Xml.XmlUrlResolver.GetEntity(Uri absoluteUri, String role, Type ofObjectToReturn)
at System.Xml.XmlTextReaderImpl.OpenStream(Uri uri)
at System.Xml.XmlTextReaderImpl.DtdParserProxy_PushExternalSubset(String systemId, String publicId)
at System.Xml.XmlTextReaderImpl.DtdParserProxy.System.Xml.IDtdParserAdapter.PushExternalSubset(String systemId, String publicId)
at System.Xml.DtdParser.ParseExternalSubset()
at System.Xml.DtdParser.ParseInDocumentDtd(Boolean saveInternalSubset)
at System.Xml.DtdParser.Parse(Boolean saveInternalSubset)
at System.Xml.XmlTextReaderImpl.DtdParserProxy.Parse(Boolean saveInternalSubset)
at System.Xml.XmlTextReaderImpl.ParseDoctypeDecl()
at System.Xml.XmlTextReaderImpl.ParseDocumentContent()
at System.Xml.XmlTextReaderImpl.Read()
at System.Xml.XmlLoader.LoadDocSequence(XmlDocument parentDoc)
at System.Xml.XmlLoader.Load(XmlDocument doc, XmlReader reader, Boolean preserveWhitespace)
at System.Xml.XmlDocument.Load(XmlReader reader)
at System.Xml.XmlDocument.LoadXml(String xml)
at ConsoleApplication36.Program.Main(String[] args) in c:\Projects\temp\ConsoleApplication36\Program.cs:line 11
like image 821
THX-1138 Avatar asked Dec 14 '10 23:12

THX-1138


2 Answers

Well, in .NET 4.0 XmlTextReader has a property called DtdProcessing. When set to DtdProcessing.Ignore it should disable DTD processing.

like image 104
L.E.O Avatar answered Oct 27 '22 22:10

L.E.O


In .net 4.5.1 I had no luck setting doc.XmlResolver to null.

The easiest fix for me was to use a string replacement to change "xmlns=" to "ignore=" before calling LoadXml(), e.g.

var responseText = await response.Content.ReadAsStringAsync();
responseText = responseText.Replace("xmlns=", "ignore=");
try
{
    var doc = new XmlDocument();
    doc.LoadXml(responseText);
    ...
}
like image 1
Patrick Avatar answered Oct 27 '22 22:10

Patrick