Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

need to remove xml nodes in a string and leave the text

Tags:

c#

xml

i have the string which is a part of an xml.

a<b>b</b>c<i>d</i>e<b>f</b>g

the problem is that i want to extract from the string the parts that are not inside any tags. so i need to extract the string"aceg" from this string and leave the characters "bdf" how can this be done?

Edit: this was a part of an xml let asume its

<div>a<b>b</b>c<i>d</i>e<b>f</b>g</div>

now its a valid xml :)

like image 673
Karim Avatar asked Sep 29 '09 12:09

Karim


2 Answers

The following regular expression will remove all tags from the string:

Regex.Replace("a<b>b</b>c<i>d</i>e<b>f</b>g", "<[^>]+>", string.Empty);
like image 173
Stoo Avatar answered Nov 09 '22 03:11

Stoo


That string is not valid XML.

However, assuming you had a valid XML string, then you could do something like this:

class Program
{
    static void Main(string[] args)
    {
        string contents = string.Empty;

        XmlDocument document = new XmlDocument();
        document.LoadXml("<outer>a<b>b</b>c<i>d</i>e<b>f</b>g</outer>");

        foreach(XmlNode child in document.DocumentElement.ChildNodes)
        {
            if (child.NodeType == XmlNodeType.Element)
            {
                contents += child.InnerText;
            }
        }

        Console.WriteLine(contents);

        Console.ReadKey();
    }
}

This will print out the string "bdf"

like image 33
Stuart Grassie Avatar answered Nov 09 '22 02:11

Stuart Grassie