Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Full text search on Google App Engine (Java)

There are a few threads floating around on the topic, but I think my use-case is somewhat different.

What I want to do:

  • Full text search component for my GAE/J app
  • The index size is small: 25-50MB or so
  • I do not need live updates to the index, a periodic re-indexing is fine
  • This is for auto-complete and the like, so it needs to be extremely fast (I get the impression that implementing an inverted index in Datastore introduces considerable latency)

My strategy so far (just planning, haven't tried implementing anything yet):

  • Use Lucene with RAMDirectory
  • A periodic cron job creates the index, serializes it to the Datastore, stores an update id (or timestamp)
  • Search servlet loads the index on startup and creates the RAMDirectory
  • On each request the servlet checks the current update id and reloads the index as necessary

The main thing I'm fuzzy on is how to synchronize in-memory data between instances - will this work, or am I missing something?

Also, how far can I push it before I start having problems with memory use? I couldn't find anything on RAM quotas for GAE. (This index is small, but I can think of more stuff I'd like to add)

And, of course, any thoughts on better approaches?

like image 389
Dmitri Avatar asked Dec 11 '10 02:12

Dmitri


2 Answers

If you're okay with periodic rebuilds, and your index is small, your current approach sounds mostly okay. Instead of building the index online and serializing it to the datastore, though, why not build it offline, and upload it with the app? Then, you can instantiate it directly from the disk store, and to push an update, you deploy a new version of your app.

like image 137
Nick Johnson Avatar answered Oct 10 '22 14:10

Nick Johnson


Recently GAE added "text search" service. Take a look at GAE Java Text Search

like image 21
Kesava Neeli Avatar answered Oct 10 '22 15:10

Kesava Neeli