My plan is to read in an XML document using my C# program, search for particular entries which I'd like to change, and then write out the modified document. However, I've become unstuck because it's hard to differentiate between elements, whether they start or end using XmlTextReader which I'm using to read in the file. I could do with a bit of advice to put me on the right track.
The document is a HTML document, so as you can imagine, it's quite complicated.
I'd like to search for an element id within the HTML document, so for example look for this and change the src;
<img border="0" src="bigpicture.png" width="248" height="36" alt="" id="lookforthis" />
If it's actually valid XML, and will easily fit in memory, I'd choose LINQ to XML (XDocument
, XElement
etc) every time. It's by far the nicest XML API I've used. It's easy to form queries, and easy to construct new elements too.
You can use XPath where that's appropriate, or the built-in axis methods (Elements()
, Descendants()
, Attributes()
etc). If you could let us know what specific bits you're having a hard time with, I'd be happy to help work out how to express them in LINQ to XML.
If, on the other hand, this is HTML which isn't valid XML, you'll have a much harder time - because XML APIs generalyl expect to work with valid XML documents. You could use HTMLTidy first of course, but that may have undesirable effects.
For your specific example:
XDocument doc = XDocument.Load("file.xml");
foreach (var img in doc.Descendants("img"))
{
// src will be null if the attribute is missing
string src = (string) img.Attribute("src");
img.SetAttributeValue("src", src + "with-changes");
}
Are the documents you are processing relatively small? If so, you could load them into memory using an XmlDocument object, modify it, and write the changes back out.
XmlDocument doc = new XmlDocument();
doc.Load("path_to_input_file");
// Make changes to the document.
using(XmlTextWriter xtw = new XmlTextWriter("path_to_output_file", Encoding.UTF8)) {
xtw.Formatting = Formatting.Indented; // optional, if you want it to look nice
doc.WriteContentTo(xtw);
}
Depending on the structure of the input XML, this could make your parsing code a bit simpler.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With