Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the best Java text indexing library for Google App Engine?

To the moment I know that compass may handle this work. But indexing with compass looks pretty expensive. Is there any lighter alternatives?

like image 270
user242726 Avatar asked Jan 03 '10 18:01

user242726


2 Answers

To be honest, I don't know if Lucene will be lighter than Compass in terms of indexing (why would it be, doesn't Compass use Lucene for that?).

Anyway, because you asked for alternatives, there is GAELucene. I'm quoting its announcement below:

Enlightened by the discussion "Can I run Lucene in google app engine?", I implemented a google datastore based Lucene component, GAELucene, which can help you to run search applications on google app engine.

The main clazz of GAELucene include:

  • GAEDirectory - a read only Directory based on google datastore.
  • GAEFile - stands for an index file, the file's byte content will be splited into multi GAEFileContent.
  • GAEFileContent - stands for a segment of index file.
  • GAECategory - the identifier of different indices.
  • GAEIndexInput - a memory-resident IndexInput? implementation like the RAMInputStream.
  • GAEIndexReader - wrapper for IndexReader? that cached in GAEIndexReaderPool
  • GAEIndexReaderPool - pool for GAEIndexReader

The following code snippet demonstrates the use of GAELucene do searching:

Query queryObject = parserQuery(request);
GAEIndexReaderPool readerPool = GAEIndexReaderPool.getInstance();
GAEIndexReader indexReader = readerPool.borrowReader(INDEX_CATEGORY_DEMO);
IndexSearcher searcher = newIndexSearcher(indexReader);
Hits hits = searcher.search(queryObject);
readerPool.returnReader(indexReader);

I warmly recommend to read the whole discussion on nabble, very informative.

Just in case, regarding Compass, Shay Banon wrote a blog entry detailing how to use Compass in App Engine here: http://www.kimchy.org/searchable-google-appengine-with-compass/

like image 78
Pascal Thivent Avatar answered Sep 25 '22 15:09

Pascal Thivent


Apache Lucene is the de-facto choice for full text indexing in Java. Looks like Compass Core contains "An implementation of Lucene Directory to store the index within a database (using Jdbc). It is separated from Compass code base and can be used with pure Lucene applications." plus tons of other stuff. You could try to separate just the Lucence component thereby stripping away several libs and making it more lightweight. Either that or ditch Compass altogether and use pure unadorned Lucene.

like image 20
Asaph Avatar answered Sep 21 '22 15:09

Asaph