I needed to transform the contents of an HTML web page using XSLT . Hence I used SgmlReader and wrote the snippet shown below (I thought, in the end, it's an XmlReader too ...)
XmlReader xslr = XmlReader.Create(new StringReader(
"<?xml version=\"1.0\" encoding=\"UTF-8\"?>" +
"<xsl:stylesheet xmlns:xsl=\"http://www.w3.org/1999/XSL/Transform\" version=\"1.0\">" +
"<xsl:output method=\"xml\" encoding=\"UTF-8\" version=\"1.0\" />" +
"<xsl:template match=\"/\">" +
"<XXX xmlns:xsd=\"http://www.w3.org/2001/XMLSchema\"><xsl:value-of select=\"count(//br)\" /></XXX>" +
"</xsl:template>" +
"</xsl:stylesheet>"));
XslCompiledTransform xslt = new XslCompiledTransform();
xslt.Load(xslr);
using (SgmlReader html = new SgmlReader())
{
StringBuilder sb = new StringBuilder();
using (TextWriter sw = new StringWriter(sb))
using (XmlWriter xw = new XmlTextWriter(sw))
{
html.InputStream = new StringReader(Resources.html_orig);
html.DocType = "HTML";
try
{
xslt.Transform(html, xw);
string output = sb.ToString();
System.Console.WriteLine(output);
}
catch (Exception exc)
{
System.Console.WriteLine("{0} : {1}", exc.GetType().Name, exc.Message);
System.Console.WriteLine(exc.StackTrace);
}
}
}
Nonetheless , I get thos error message
NullReferenceException : Object reference not set to an instance of an object.
at MS.Internal.Xml.Cache.XPathDocumentBuilder.Initialize(XPathDocument doc, IXmlLineInfo lineInfo, String baseUri, LoadFlags flags)
at MS.Internal.Xml.Cache.XPathDocumentBuilder..ctor(XPathDocument doc, IXmlLineInfo lineInfo, String baseUri, LoadFlags flags)
at System.Xml.XPath.XPathDocument.LoadFromReader(XmlReader reader, XmlSpace space)
at System.Xml.XPath.XPathDocument..ctor(XmlReader reader, XmlSpace space)
at System.Xml.Xsl.Runtime.XmlQueryContext.ConstructDocument(Object dataSource, String uriRelative, Uri uriResolved)
at System.Xml.Xsl.Runtime.XmlQueryContext..ctor(XmlQueryRuntime runtime, Object defaultDataSource, XmlResolver dataSources, XsltArgumentList argList, WhitespaceRuleLookup wsRules)
at System.Xml.Xsl.Runtime.XmlQueryRuntime..ctor(XmlQueryStaticData data, Object defaultDataSource, XmlResolver dataSources, XsltArgumentList argList, XmlSequenceWriter seqWrt)
at System.Xml.Xsl.XmlILCommand.Execute(Object defaultDocument, XmlResolver dataSources, XsltArgumentList argumentList, XmlSequenceWriter results)
at System.Xml.Xsl.XmlILCommand.Execute(Object defaultDocument, XmlResolver dataSources, XsltArgumentList argumentList, XmlWriter writer, Boolean closeWriter)
at System.Xml.Xsl.XmlILCommand.Execute(XmlReader contextDocument, XmlResolver dataSources, XsltArgumentList argumentList, XmlWriter results)
at System.Xml.Xsl.XslCompiledTransform.Transform(XmlReader input, XmlWriter results)
I found a way to work around this by converting the HTML to XML and then applying the transform , but that's an inefficient solution because :
So (since I know StackOverflow community always provides great answers whereas other C# forums have completely disappointed me ;o) I'll be looking for feedback and suggestions so as to perform XSL transformations using HTML directly (even if SgmlReader needs to be replaced by another similar library).
Even if the SgmlReader
class is extending the XmlReader
class it doesn't mean that it also behaves like an XmlReader
.
Technically it also does not make sense that SgmlReader
is a subclass of XmlReader
, simply because SGML is a superset of XML and not a subset.
You didn't write about the purpose of your transformation, but in general HTML Agility Pack is a good option for manipulating HTML.
Have you tried using the HTML Agility Pack instead of SgmlReader
? You can load the html into it, and run a transform against it directly. I'm not positive if an XML document is created internally, though - although it seems as though one is not you would probably want to compare memory and CPU usage against the conversion method you tried and discarded.
//You already have your xslt loaded into var xslt...
HtmlDocument doc = new HtmlDocument();
doc.Load( ... ); //load your HTML doc, or use LoadXML from a string, etc
xslt.Transform(doc, xw);
See also this question: How to use HTML Agility pack
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With