Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Full-text search for static HTML files on CD-Rom via javascript

I will be delivering a set of static HTML pages on CD-Rom; these pages need to be fully viewable with no Internet access whatsoever.

I'd like to provide a full-text search (Lucene-like) for the content of those pages, which should "just work" from the CD-Rom with no software installation on the client machine.

A search engine implementation in javascript would be the perfect solution, but I have trouble finding any that looks solid / current / popular...?

I did find these: + jsFind + js-search

but both projects seem rather inactive?

Another solution, besides a specific search engine in javascript, would be the ability to access local Lucene indices from javascript: the indices themselves would be built with Lucene and copied to the CD-Rom along with the HTML files.

Edit: built it myself (see below).

like image 291
Bambax Avatar asked Aug 31 '09 12:08

Bambax


1 Answers

Well in fact I built it myself.

The existing solutions (that I could find) were unconvincing.

I wanted to be able to search a very long tree (ul/li/ul...) that is displayed as one page; it contains 5000+ items.

It sounds a little weird to display such a long tree on one page but in fact with collapse / expand it's much more intuitive than separate pages, and since we're offline, download times are not a problem (parsing times are, though, but Chrome is amazing ;-)

The "search" function provided with modern browsers (FF and Chrome anyway) have two big problems: they only search visible items on the page, and they can't search non-consecutive words.

I want to be able to search collapsed items (not visible on the screen); I want to find "one two three" when searching "one three" (just like with Google / Lucene); and I want to open just the branches of the tree containing found items.

So, what I did was:

  1. create an inverted index of words <-> ids of items from the list (via xslt) (approx. 4500 unique words in the document)
  2. convert this index to bunch of javascript arrays (one word = one array, containing ids)
  3. when searching, intersect the arrays represented by the search words
  4. step 3 returns an array of ids that I can then open / highlight

It does exactly what I needed and it's really fast. Better yet, since it searches from an independant "index" (arrays of ids) it can search when the list is not even loaded in the browser!

like image 187
Bambax Avatar answered Oct 17 '22 15:10

Bambax