Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to split text with HTML tags to array

I have very simple text with HTML (only <b> tag) e.g.

Lorem Ipsum is <b>simply dummy</b> text of the printing and <b>typesetting industry</b>

I would like to split the text to array like this:

[0] - Lorem Ipsum is 
[1] - <b>simply dummy</b>
[2] - text of the printing and
[3] - <b>typesetting industry</b>

The text inside HTML tag must be separated from another text. Is there any simple solution for it?

Thank you

like image 246
Jakub Krampl Avatar asked May 26 '15 10:05

Jakub Krampl


2 Answers

You may achieve this using following code

string value = @"Lorem Ipsum is <b>simply dummy</b> text of the printing and <b>typesetting industry</b>";
var parts = Regex.Split(value, @"(<b>[\s\S]+?<\/b>)").Where(l => l != string.Empty).ToArray();
like image 70
HadiRj Avatar answered Sep 17 '22 03:09

HadiRj


I just wrote this, tested it and it works. It's a bit ugly but it works hahah

    public string[] getHtmlSplitted(String text)
    {
        var list = new List<string>();
        var pattern = "(<b>|</b>)";
        var isInTag = false;            
        var inTagValue = String.Empty;

        foreach (var subStr in Regex.Split(text, pattern))
        {
            if (subStr.Equals("<b>"))
            {
                isInTag = true;
                continue;
            }
            else if (subStr.Equals("</b>"))
            {
                isInTag = false;
                list.Add(String.Format("<b>{0}</b>", inTagValue));
                continue;
            }

            if (isInTag)
            {
                inTagValue = subStr;
                continue;
            }

            list.Add(subStr);

        }
        return list.ToArray();
    }
like image 22
Sid Avatar answered Sep 17 '22 03:09

Sid