Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using XRegExp.matchRecursive for nested spans

I want to achieve a way to get all the content between one open span tag and it's close tag. The problem is that sometime I can have nested span and I want to be sure that my regex don't stop a the first ending span it see.

To see my problem look at this : Regex101 : nested span

I want to be sure that I get everything between the open and the close tag. no matter how much </span> I can find inside.

I have found a library made by Steven Levithan which could achieve my wants. The problem I have is that the example are basic and I am not sure I can achieve what I want.

I'm using the XregExp.matchRecursive method. In the example they give a start tag and a end tag. My start tag is a bit complicated, it look like that : <span style=\\?"color:([a-zA-Z\s]*?)\\?">. The problem is when I execute this method with this delimiter, I get this error : string contains unbalanced delimiters. The tested string is :

<p style=\"text-align:justify\">
    <span style=\"font-size:12pt\">
        <span style=\"color:Green\">
            <span style=\"font-family:Verdana\">There is some content for a mm advertisment.There is some co</span>
            <span style=\"font-family:Times New Roman\">ntent for a mm advertisment.</span>
        </span>
    </span>
</p>

I think my problem is because of the regex I use as a start delimiter. As explain in the doc we should add a level of escaping backslash in the regex. That's why I try this regex as start delimiter : <span style=\\\\?"color:([a-zA-Z\\s]*?)\\\\?">. Still not working. I don't see how I can do to tell this method to find everything between the span who have the color style attribute and his close tag.

Maybe somebody have a solution?

like image 610
Ganbin Avatar asked Oct 31 '22 00:10

Ganbin


2 Answers

So the block you're hitting is the error "string contains unbalanced delimiters".

That would be because your start delimiter only matches one of the start span tags in your test input (the one that specifies the colour) but your end delimiter matches all four of the end span tags.

I think you'll have to approach this by firstly matching all the span tags (with the library you've found) and then re-process to find the ones you care about.

like image 62
randomsimon Avatar answered Nov 12 '22 18:11

randomsimon


Is there perhaps an option to use some kind of a parser that is more powerful than regular expressions? The latter are, generally speaking, not really suitable for parsing non-regular languages, even though they might provide certain extensions compared to "pure" regular expressions in theoretical sense.

like image 25
plamut Avatar answered Nov 12 '22 20:11

plamut