Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Full text search in HTML ignoring tags / &

I've recently seen a lot of libraries for searching and highlighting terms within an HTML page. However, every library I saw has the same problem, they can't find text partly encased in an html tag and/or they'd fail at finding special characters which are &-expressed.


Example a:

<span> This is a test. This is a <b>test</b> too</span>

Searching for "a test" would find the first instance but not the second.


Example b:

<span> Pencils in spanish are called l&aacute;pices</span>

Searching for "lápices" or "lapices" would fail to produce a result.


Is there a way to circumvent these obstacles?

Thanks in Advance!

like image 567
Bruno Avatar asked May 04 '11 16:05

Bruno


People also ask

How do I ignore a tag in HTML?

If you have a certain part or section of an HTML or XHTML document that you want CSE HTML Validator to ignore, then you can enclose it in "cseignore" tags.

How do you search for text in HTML?

The <input type="search"> defines a text field for entering a search string. Note: Remember to set a name for the search field, otherwise nothing will be submitted. The most common name for search inputs is q. Tip: Always add the <label> tag for best accessibility practices!

What tag is used for all visible content?

In HTML, the <body> tag holds all the visible content of a proper web-page. Primarily, the <head> and <body> tags contain all the content of the HTML document.

How do I ignore a div in HTML?

You cannot do such things in HTML. You can use JavaScript to modify the document tree, replacing element nodes by their contents or by manipulating the innerHTML property. Of course, search engines and friends would still “see” the tags you make browsers ignore.


1 Answers

You can use window.find() in non-IE browsers and TextRange's findText() method in IE. Here's an example:

http://jsfiddle.net/xeSQb/6/

Unfortunately Opera prior to the switch to the Blink rendering engine in version 15 doesn't support either window.find or TextRange. If this is a concern for you, a rather heavyweight alternative is to use a combination of the TextRange and CSS class applier modules of my Rangy library, as in the following demo: http://rangy.googlecode.com/svn/trunk/demos/textrange.html

The following code is an improvement of the fiddle above by unhighlighting the previous search results each time a new search is performed:

function doSearch(text,color="yellow") {
    if (color!="transparent") {
      doSearch(document.getElementById('hid_search').value,"transparent"); 
      document.getElementById('hid_search').value = text; 
      }
    if (window.find && window.getSelection) {
        document.designMode = "on";
        var sel = window.getSelection();
        sel.collapse(document.body, 0);
        
        while (window.find(text)) {
            document.execCommand("HiliteColor", false, color);
            sel.collapseToEnd();
        }
        document.designMode = "off";
    } else if (document.body.createTextRange) {
        var textRange = document.body.createTextRange();
        while (textRange.findText(text)) {
            textRange.execCommand("BackColor", false, color);
            textRange.collapse(false);
        }
    }
}
<input type="text" id="search">
<input type="hidden" id="hid_search">
<input type="button" id="button" onmousedown="doSearch(document.getElementById('search').value)" value="Find">

<div id="content">
    <p>Here is some searchable text with some lápices in it, and more lápices, and some <b>for<i>mat</i>t</b>ing</p>
</div> 
like image 99
Tim Down Avatar answered Oct 10 '22 03:10

Tim Down