Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract text from <1></1> (HTML/XML-Like but with Number Tag)

Tags:

c#

So I have a long string containing pointy brackets that I wish to extract text parts from.

string exampleString = "<1>text1</1><27>text27</27><3>text3</3>";

I want to be able to get this

1 = "text1"
27 = "text27"
3 = "text3"

How would I obtain this easily? I haven't been able to come up with a non-hacky way to do it.

Thanks.

like image 962
DiscoPogo Avatar asked Mar 13 '23 19:03

DiscoPogo


1 Answers

Using basic XmlReader and some other tricks to do wrapper to create XML-like data, I would do something like this

string xmlString = "<1>text1</1><27>text27</27><3>text3</3>";
xmlString = "<Root>" + xmlString.Replace("<", "<o").Replace("<o/", "</o") + "</Root>";
string key = "";
List<KeyValuePair<string,string>> kvpList = new List<KeyValuePair<string,string>>(); //assuming the result is in the KVP format
using (XmlReader xmlReader = XmlReader.Create(new StringReader(xmlString))){
    bool firstElement = true;
    while (xmlReader.Read()) {
        if (firstElement) { //throwing away root
            firstElement = false;
            continue;
        }
        if (xmlReader.NodeType == XmlNodeType.Element) {
            key = xmlReader.Name.Substring(1); //cut of "o"
        } else if (xmlReader.NodeType == XmlNodeType.Text) {
            kvpList.Add(new KeyValuePair<string,string>(key, xmlReader.Value));
        }
    }
}

Edit:

The main trick is this line:

xmlString = "<Root>" + xmlString.Replace("<", "<o").Replace("<o/", "</o") + "</Root>"; //wrap to make this having single root, o is put to force the tagName started with known letter (comment edit suggested by Mr. chwarr)

Where you first replace all opening pointy brackets with itself + char, i.e.

<1>text1</1> -> <o1>text1<o/1> //first replacement, fix the number issue 

and then reverse the sequence of all the opening point brackets + char + forward slash to opening point brackets + forward slash + char

<o1>text1<o/1> -> <o1>text1</o1> //second replacement, fix the ending tag issue

Using simple WinForm with RichTextBox to print out the result,

for (int i = 0; i < kvpList.Count; ++i) {
    richTextBox1.AppendText(kvpList[i].Key + " = " + kvpList[i].Value + "\n");
}

Here is the result I get:

enter image description here

like image 94
Ian Avatar answered Mar 23 '23 02:03

Ian