Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multi-tenant Lucene indexing strategy

I'm designing a multi-tenant SaaS application where tenants will be able to store data and perform search on it. I plan to use Lucene (actually, Lucene.Net) as the search engine. As cross-tenant searches are not required, I am considering having one index (so one directory) per tenant.

I don't expect the index writes to be insanely frequent, so they will be queued to a single process that will open the index, add the doc and close the index as updates arrive.

I would like to have something more efficient on the reads, though. The number of tenants may scale from hundreds to tens of thousands, so keeping all directories open in RAM on each search node is not sensible. I am thinking about managing a shortlist of recently used or maybe most frequently used directories, regularly closing those that fall outside of the criteria.

I'm really new to Lucene in general so would appreciate some feedback on the strategy.

Thanks

like image 641
ThomasWeiss Avatar asked Dec 24 '13 09:12

ThomasWeiss


1 Answers

Besides the strategy you mention, you could also consider having a single index for all clients, and just ANDding the right client query to all user queries to make sure each one gets only his own data:

TermQuery clientQuery = new TermQuery(new Term("clientid",clientid));

BooleanQuery query = new BooleanQuery();

query.add(userQuery,BooleanClause.Occur.MUST);

query.add(clientQuery,BooleanClause.Occur.MUST);

If you have many tenants and their indexes are in average small/slightly used, this might work better. Then, there is a possible twist, if your data has a temporal axis, you can also partition this big index into yearly, monthly or daily chunks. So typically the most recent ones are used more often and you get better caching of the OS, less memory usage etc.

like image 105
Persimmonium Avatar answered Oct 02 '22 23:10

Persimmonium