Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Keep HTML tags in XML using LINQ to XML

I have an xml file from which I am extracting html using LINQ to XML. This is a sample of the file:

<?xml version="1.0" encoding="utf-8" ?>
<tips>
    <tip id="0">
    This is the first tip.
</tip>
<tip id="1">
    Use <b>Windows Live Writer</b> or <b>Microsoft Word 2007</b> to create and publish content.
</tip>
<tip id="2">
    Enter a <b>url</b> into the box to automatically screenshot and index useful webpages.
</tip>
<tip id="3">
    Invite your <b>colleagues</b> to the site by entering their email addresses.  You can then share the content with them!
</tip>
</tips>

I am using the following query to extract a 'tip' from the file:

Tip tip = (from t in tipsXml.Descendants("tip")
                   where t.Attribute("id").Value == nextTipId.ToString()
                   select new Tip()
                   {
                     TipText= t.Value,
                     TipId = nextTipId
                   }).First();

The problem I have is that the Html elements are being stripped out. I was hoping for something like InnerHtml to use instead of Value, but that doesn't seem to be there.

Any ideas?

Thanks all in advance,

Dave

like image 441
David Gouge Avatar asked Jan 19 '09 15:01

David Gouge


2 Answers

Call t.ToString() instead of Value. That will return the XML as a string. You may want to use the overload taking SaveOptions to disable formatting. I can't check right now, but I suspect it will include the element tag (and elements) so you would need to strip this off.

Note that if your HTML isn't valid XML, you will end up with an invalid overall XML file.

Is the format of the XML file completely out of your control? It would be nicer for any HTML inside to be XML-encoded.

EDIT: One way of avoiding getting the outer part might be to do something like this (in a separate method called from your query, of course):

StringBuilder builder = new StringBuilder();
foreach (XNode node in element.Nodes())
{
    builder.Append(node.ToString());
}

That way you'll get HTML elements with their descendants and interspersed text nodes. Basically it's the equivalent of InnerXml, I strongly suspect.

like image 156
Jon Skeet Avatar answered Oct 20 '22 18:10

Jon Skeet


Just use string.Concat(tip.Nodes()) to get the content with html tags

like image 1
Vijay kumar EK Avatar answered Oct 20 '22 17:10

Vijay kumar EK