Regular expression: finding two elements not surrounding another element in text

Question

I need to find badly formatted HTML content from some text; we let users add strong and em tags but they don't always close them correctly

This is some <b>correct</b> formatting
This is some <b>incorrect<b> formatting

I would like to catch instances where the formatting is incorrect, ie where an opening tag is not followed by a closing tag. I started using negative lookaheads but have had not much success so far

<b>(?!.*?<\/b>.*?)<b>

<b> Get opening tag
(?! negative lookahead for
- .*? anything, but not greedily
- <\/b> the closing tag
- .*? anything, but not greedily
) closing the lookahead
<b> Another opening tag

Any idea how I could do that?

Addendum: I know about Tony the pony, but I feel it is not coming right now. This problem could be replaced by "I want to find two occurences of a word "zoinx" where there is no occurence of the word "palantir" in between" which is not HTML-related

vks · Accepted Answer

<b>(?:(?!<\/b>).)*<b>

Try this.See demo.

https://regex101.com/r/nS2lT4/19

For a generalized version use

<([^>]*)>(?:(?!<\/\1>).)*<\1>

See demo.

https://regex101.com/r/nS2lT4/24

Regular expression: finding two elements not surrounding another element in text

Tags:

regex

samy

1 Answers

vks

Recent Activity

Donate For Us

Regular expression: finding two elements not surrounding another element in text

Tags:

regex

samy

1 Answers

vks

Related questions

Recent Activity

Donate For Us