Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

RegEx: Matching a especific string that is not inside in HTML tag

Tags:

html

regex

<tag value='botafogo'> botafogo is the best </tag>

Needs match only botafogo (...is the best) and not 'botafogo' value

my program "annotates" automatically the term in a pure text:

botafogo is the best 

to

<team attr='best'>botafogo</team> is the best 

and when i "replace all" the "best" word, i have a big problem...

<team attr='<adjective>best</adjective>'>botafogo</team> is the <adjective>best</adjective>

Ps.: Java language

like image 987
celsowm Avatar asked Dec 30 '22 00:12

celsowm


1 Answers

The best way to accomplish this is to NOT use regular expression and use a proper HTML parser. HTML is not a regular language and doing this with regular expression will be tedious, hard to maintain, and more than likely still contain various errors.

HTML parsers, on the hand, are well-suited for the job. Many of them are mature and reliable, and they take care of every little details for you and makes your life much easier.

like image 126
polygenelubricants Avatar answered Dec 31 '22 13:12

polygenelubricants