Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Overriding or ignoring undeclared entities in C# using LINQ

I have a little utility that runs through looking for certain things in XML files using LINQ. It processes a MASSIVE collection of them rather quickly and nicely. However, about 20% of a certain batch of files fail to be read and are skipped, failing because of the degree symbol's presence as ° in the files. This is the "Reference to undeclared entity 'deg'." a previous question was about.

The solutions offered in the previous question cannot be directly applied here. I am not at liberty to go around modifying the files, and making copies of them and replacing instances or inserting tags in the copies seems inefficient. What would be the best way to go about getting LINQ to ignore the undeclared entities, which have absolutely no bearing on what my program does anyway? Or is there perhaps a good way of getting an XDocument.Load to be fed some entity declarations beforehand?

like image 592
UtopiaLtd Avatar asked Nov 17 '11 16:11

UtopiaLtd


1 Answers

Unfortunately entities form part of the well-formedness rules for XML (2.1 Well-Formed XML Documents). It seems like you're saying you want the XDocument.Load to load what is notionally an XML file, but does not in fact conform to the rules, which it won't do, quite reasonably.

If your users are passing you what are supposed to be XML files, but that have undefined entities, then either you have to get them to provide the files in a valid format, or manage the incorrectness youself at load-time, in the ways that have been suggested.

It seems to me, from your restrictions, that the neatest approach would be to follow the example linked-to and create some settings to pass into the XMLReader along the lines of (Validating an XML Document in the DOM).

If there are entities which aren't defined and aren't listed in public schemas, you'll need to create your own schema which defines all the entities you need. So, create a generic settings for the XMLReader which references your own, custom schema. Add the necessary entities to this schema as certain files fail to load and then you'll build up a list of all the entites that you need to define in order that the XML files are valid.

Then, for each document you try to load, create an XMLReader for the file using the settings above and call the XDocument(XMLReader) overload.

like image 173
nicodemus13 Avatar answered Sep 20 '22 18:09

nicodemus13