Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find a Word that is enclosed from Html Tags?

I'm programming a spell checker in Javascript in combination with OpenOffice dictionary, and I have a serious problem.

I can find whole words using RegEx, but if the word looks like prog<b>ram</b>ing, I can find it if I remove all html tags with the .text() method from jQuery. But how can I replace this word and rebuild the original html structure?

Spellchecker.com does it very smartly - the spell check recognizes even words like prog<b>ram</b>ing if they are misspelled!

like image 561
yas Avatar asked Dec 30 '25 00:12

yas


1 Answers

/([\s>"'])prog(<[^>]+>)ram(<[^>]+>)ing([\s\.,:;"'<])/g 

will match your example

So roughly the following regex will find all instances of the word, even those broken with html

 var regExp = new RegExp('([\s>"\'])' + word.split('').join('(<[^>]+>)') + '([\s\.,:;"\'<])',g);

God knows how that'll help you build a spellchecker though. I suspect the approach used in spellcheckers would be more like 'do a spellcheck assuming no html, and if there is html in a word then strip it out using something like the method below, and do a spellcheck as normal for the string you get:

String.prototype.stripHtml = function() {
  return this.replace(/(<[^>]+>)/, '');
}
like image 92
wheresrhys Avatar answered Jan 01 '26 18:01

wheresrhys



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!