Full text search in HTML ignoring tags / &

I've recently seen a lot of libraries for searching and highlighting terms within an HTML page. However, every library I saw has the same problem, they can't find text partly encased in an html tag and/or they'd fail at finding special characters which are &-expressed.

Example a:

<span> This is a test. This is a <b>test</b> too</span>

Searching for "a test" would find the first instance but not the second.

Example b:

<span> Pencils in spanish are called l&aacute;pices</span>

Searching for "lápices" or "lapices" would fail to produce a result.

Is there a way to circumvent these obstacles?

Thanks in Advance!

1 Answers

You can use window.find() in non-IE browsers and TextRange's findText() method in IE. Here's an example:


Unfortunately Opera prior to the switch to the Blink rendering engine in version 15 doesn't support either window.find or TextRange. If this is a concern for you, a rather heavyweight alternative is to use a combination of the TextRange and CSS class applier modules of my Rangy library, as in the following demo: http://rangy.googlecode.com/svn/trunk/demos/textrange.html

The following code is an improvement of the fiddle above by unhighlighting the previous search results each time a new search is performed:

function doSearch(text,color="yellow") {
    if (color!="transparent") {
      document.getElementById('hid_search').value = text; 
    if (window.find && window.getSelection) {
        document.designMode = "on";
        var sel = window.getSelection();
        sel.collapse(document.body, 0);
        while (window.find(text)) {
            document.execCommand("HiliteColor", false, color);
        document.designMode = "off";
    } else if (document.body.createTextRange) {
        var textRange = document.body.createTextRange();
        while (textRange.findText(text)) {
            textRange.execCommand("BackColor", false, color);
<input type="text" id="search">
<input type="hidden" id="hid_search">
<input type="button" id="button" onmousedown="doSearch(document.getElementById('search').value)" value="Find">

<div id="content">
    <p>Here is some searchable text with some lápices in it, and more lápices, and some <b>for<i>mat</i>t</b>ing</p>
