Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best way to read, modify, and write XML

Tags:

c#

xml

My plan is to read in an XML document using my C# program, search for particular entries which I'd like to change, and then write out the modified document. However, I've become unstuck because it's hard to differentiate between elements, whether they start or end using XmlTextReader which I'm using to read in the file. I could do with a bit of advice to put me on the right track.

The document is a HTML document, so as you can imagine, it's quite complicated.

I'd like to search for an element id within the HTML document, so for example look for this and change the src;

<img border="0" src="bigpicture.png" width="248" height="36" alt="" id="lookforthis" />
like image 262
wonea Avatar asked Sep 17 '10 15:09

wonea


2 Answers

If it's actually valid XML, and will easily fit in memory, I'd choose LINQ to XML (XDocument, XElement etc) every time. It's by far the nicest XML API I've used. It's easy to form queries, and easy to construct new elements too.

You can use XPath where that's appropriate, or the built-in axis methods (Elements(), Descendants(), Attributes() etc). If you could let us know what specific bits you're having a hard time with, I'd be happy to help work out how to express them in LINQ to XML.

If, on the other hand, this is HTML which isn't valid XML, you'll have a much harder time - because XML APIs generalyl expect to work with valid XML documents. You could use HTMLTidy first of course, but that may have undesirable effects.

For your specific example:

XDocument doc = XDocument.Load("file.xml");
foreach (var img in doc.Descendants("img"))
{
    // src will be null if the attribute is missing
    string src = (string) img.Attribute("src");
    img.SetAttributeValue("src", src + "with-changes");
}
like image 70
Jon Skeet Avatar answered Oct 08 '22 20:10

Jon Skeet


Are the documents you are processing relatively small? If so, you could load them into memory using an XmlDocument object, modify it, and write the changes back out.

XmlDocument doc = new XmlDocument();
doc.Load("path_to_input_file");
// Make changes to the document.
using(XmlTextWriter xtw = new XmlTextWriter("path_to_output_file", Encoding.UTF8)) {
  xtw.Formatting = Formatting.Indented; // optional, if you want it to look nice
  doc.WriteContentTo(xtw);
}

Depending on the structure of the input XML, this could make your parsing code a bit simpler.

like image 5
Pat Daburu Avatar answered Oct 08 '22 20:10

Pat Daburu