Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Image tag not closing with HTMLAgilityPack

Using the HTMLAgilityPack to write out a new image node, it seems to remove the closing tag of an image, e.g. should be but when you check outer html, has .

string strIMG = "<img src='" + imgPath + "' height='" + pubImg.Height + "px' width='" + pubImg.Width + "px' />";

HtmlNode newNode = HtmlNode.Create(strIMG);

This breaks xhtml.

like image 1000
mickyjtwin Avatar asked Apr 17 '09 07:04

mickyjtwin


2 Answers

Telling it to output XML as Micky suggests works, but if you have other reasons not to want XML, try this:

doc.OptionWriteEmptyNodes = true;
like image 90
Rahul Avatar answered Sep 30 '22 16:09

Rahul


Edit 1:Here is how to fix an HTML Agilty Pack document to correctly display image (img) tags:

if (HtmlNode.ElementsFlags.ContainsKey("img"))
{   HtmlNode.ElementsFlags["img"] = HtmlElementFlag.Closed;}
else
{   HtmlNode.ElementsFlags.Add("img", HtmlElementFlag.Closed);}

replace "img" for any other tag to fix them as well (input, select, and option come up frequently). Repeat as needed. Keep in mind that this will produce rather than , because of the HAP bug preventing the "closed" and "empty" flags from being set simultaneously. Source: Mike Bridge

Original answer: Having just labored over solutions to this issue, and not finding any sufficient answers (doctype set properly, using Output as XML, Check Syntax, AutoCloseOnEnd, and Write Empty Node options), I was able to solve this with a dirty hack. This will certainly not solve the issue outright for everyone, but for anyone returning their generated html/xml as a string (EG via a web service), the simple solution is to use fake tags that the agility pack doesn't know to break. Once you have finished doing everything you need to do on your document, call the following method once for each tag giving you a headache (notable examples being option, input, and img). Immediately after, render your final string and do a simple replace for each tag prefixed with some string (in this case "Fix_", and return your string. This is only marginally better in my opinion than the regex solution proposed in another question I cannot locate at the moment (something along the lines of )

private void fixHAPUnclosedTags(ref HtmlDocument doc, string tagName, bool hasInnerText = false)
{
    HtmlNode tagReplacement = null;
    foreach(var tag in doc.DocumentNode.SelectNodes("//"+tagName))
    {
        tagReplacement = HtmlTextNode.CreateNode("<fix_"+tagName+"></fix_"+tagName+">");
        foreach(var attr in tag.Attributes)
        {
            tagReplacement.SetAttributeValue(attr.Name, attr.Value);
        }
        if(hasInnerText)//for option tags and other non-empty nodes, the next (text) node will be its inner HTML
        {
            tagReplacement.InnerHtml = tag.InnerHtml + tag.NextSibling.InnerHtml;
            tag.NextSibling.Remove();
        }
        tag.ParentNode.ReplaceChild(tagReplacement, tag);
    }
}

As a note, if I were a betting man I would guess that MikeBridge's answer above inadvertently identifies the source of this bug in the pack - something is causing the closed and empty flags to be mutually exclusive

Additionally, after a bit more digging, I don't appear to be the only one who has taken this approach: HtmlAgilityPack Drops Option End Tags

Furthermore, in cases where you ONLY need non-empty elements, there is a very simple fix listed in that same question, as well as the HAP codeplex discussion here: This essentially sets the empty flag option listed in Mike Bridge's answer above permanently everywhere.

like image 29
MaxPRafferty Avatar answered Sep 30 '22 18:09

MaxPRafferty