Logo Questions Linux Laravel Mysql Ubuntu Git Menu

Workaround for "undeclared prefix" error on XElement.Load()




I'm pulling the source of a website. I then want to extract a specific part of it. My intention is to do this with LINQ-to-XML.

However, I get errors when I parse the source:

XElement source = XElement.Load(reader);

The problem seems to be references to namespaces I don't have. I get the error: 'addthis' is an undeclared prefix. Line 130, position 51. due to this line:

<div class="addthis_toolbox addthis_pill_combo" addthis:url="http://www.foo.com/foo">

And if I delete that one, other occur.

Thing is, I only care about one piece of this XML file - I don't need to be able to parse the whole file. I just want it in an XElement so I can find that one piece of it. Is there a way for me to hack around the parsing error? And I need a generic solution - I want to parse the file regardless of ANY undeclared prefix errors.


like image 905
Pieter Müller Avatar asked Sep 26 '11 15:09

Pieter Müller

2 Answers

This XML is not valid.

In order to use a namespace prefix (such as addthis:), the namespace must be declared, by writing xmlns:addthis="some URI".

In general, you shouldn't parse HTML using an XML parser, since HTML is likely to be invalid XML, for this reason and a number of other reasons (undeclared entities, unescaped JS, unclosed tags).
Instead, use HTML Agility Pack.

like image 113
SLaks Avatar answered Nov 12 '22 02:11


If you need to do it all in code what you want is something like this:

    XmlReaderSettings settings = new XmlReaderSettings { NameTable = new NameTable() };
    XmlNamespaceManager xmlns = new XmlNamespaceManager(settings.NameTable);
    xmlns.AddNamespace("addthis", "");
    XmlParserContext context = new XmlParserContext(null, xmlns, "", XmlSpace.Default);
    XmlReader reader = XmlReader.Create(new StringReader(text), settings, context);

And for any additional prefixes add more of these:

    xmlns.AddNamespace("prefix", "");
like image 32
Mason Cloud Avatar answered Nov 12 '22 02:11

Mason Cloud