I have some HTML files, upon which I have no control. Thus I can't change their structure or markup.
For each of these HTML files, a list of words would be found based on another algorithm. These words should be highlighted in the text of HTML. For example if the HTML markup is:
<p>
Monkeys are going to die soon, if we don't stop killing them.
So, we have to try hard to persuade hunters not to hunt monkeys.
Monkeys are very intelligent, and they should survive.
In fact, they deserve to survive.
</p>
and the list of the words is:
are, we, monkey
the result should be something like:
<p>
<span class='highlight'>Monkeys</span>
<span class='highlight'>are</span>
going to die soon, if
<span class='highlight'>we</span>
don't stop killing them.
So,
<span class='highlight'>we</span>
have to try hard to persuade hunters
not to hunt
<span class='highlight'>monkeys</span>
. They
<span class='highlight'>are</span>
very intelligent, and they should survive.
In fact, they deserve to survive.
</p>
The highlighting algorithm should:
elements) (some of these files are HTML export of MS Word, and I think you got what I mean by dirty!!!)
What I've done till now:
["are", "we", "monkey"]
Please note that you can watch it online here (username: [email protected], pass: demo). Also current script could be seen at the end of the page's source.
Concatenate your words with |
into a string, and then interpret the string as a regex, and then substitute occurences with the full match surrounded by the highlight tags.
The following regular expressions works for your example. Maybe you can pick it up from there:
"Monkeys are going to die soon, if we don't stop killing them. So, we have to try hard to persuade hunters not to hunt monkeys. Monkeys are very intelligent, and they should survive. In fact, they deserve to survive.".replace(/({we|are|monkey[s]?}*)([\s\.,])/gi, "<span class='highlight'>$1</span>$2")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With