What advantages are there for using either XSLT or Linq to XML for HTML parsing in C#? This is under the assumption that the html has been cleaned so it is valid xhtml. These values will eventually go into a c# object to be validated and processed.
Please let me know if these are valid and if there are other things to consider.
XSLT Advantages:
XSLT Disadvantages:
Linq to XML Advantages:
Linq to XML Disadvantages:
Edit: I should clarify, I want these to run long term an the website may update their layout once a while. That was one of the bigger reason I thought I would use something that didn't require compiling.
In my experience, XSLT is more concise and readable when you're primarily dealing with rearranging and selecting existing xml elements. XPath is short and easy to understand, and the xml syntax avoids littering your code with XElement
and XAttribute
statements. XSLT works fine as a xml-tree transform language.
However, it's string handling is poor, looping is unintuitive, and there's no meaningful concept of subroutines - you can't transform the output of another transform.
So, if you want to actually fiddle with element and attribute content, then it quickly falls short. There's no problem in using both, incidentally - XSLT to normalize the structure (say, to ensure that all table
elements have tbody
elements), and linq-to-xml to interpret it. The prioritized conditional matching possibilities mean XSLT is easier to use when dealing with many similiar but distinct matches. Xslt is good at document simplification, but it's just missing too many basic features to be sufficient on its own.
Having jumped whole-heartedly on the Linq-to-Xml bandwagon, I'd say that it has less overlap with XSLT that might seem at first glance. (And I'd positively love to see an XSLT 2.0/XQuery 1.0 implementation for .NET).
In terms of performance, both techs are speedy. In fact, since it's so hard to express slow operations, you're unlikely to accidentally trigger a slow case in XSLT (unless you start playing with recursion...). By contrast, LINQ to Xml power also can make it slow: just use any heavy-weight .NET object in some inner loop and you've got a budding performance problem.
Whatever you do, don't try to abuse XSLT by using it to perform anything but the simplest of logic: it's way more wordy and far less readable than the equivalent C#. If you need a bunch of logic (even simple things like date > DateTime.Now ? "will be" : "has"
become huge bloated hacks in XSLT) and you don't want to use both XSLT and Linq to Xml, use Linq.
Without further knowing your use case it is hard to give you general recommendations.
Anyhow, you are somewhat comparing apples and oranges. LINQ to XML (and LINQ in general) is a query language whereas XSLT is a programming language to transform XML tree structures. These are different concepts. You would use a query language whenever you want to extract a certain specific piece of information from a data source to do whatever you need to do with it (be it to set fields in a C# object). A transformation, in contrast, would be useful to convert one XML representation of your data into another XML representation.
So if your aim is to create C# objects from XML, you probably don't want to use XSLT but any of the other technologies offered by the .NET Framework to process XML data: the old XmlDocument
, XmlReader
, XPathDocument
, XmlSerializer
or XDocument
. Each has it's special advantages and disadvantages, depending on input size, input complexity, desired output etc.
Since you are dealing with HTML only, you might also want to have a look at the HTML Agility Pack on CodePlex.
Since you're going to C#, at some point your data is going to go through Linq (or some other XML code for .NET) anyway, you may as well stick it all there.
Unless you have some compelling reason to go with XSLT, such as you already have a lot of experience or the deployment strongly favours rolling out the text files, keep it all in one place.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With