Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Should create multiple document types or multiple indexes?

We host lots of websites for businesses, each business will have a number of document types they may want to get indexed and searched via ES.

Normally, the number of document types each business has is less than 20, each type may have less than 100k documents (usually much less).

I'm not sure how I should setup the data for these websites? Should I put them into separate index, or should I jam them all into the same index with different document types? Or if there is something else?

Or perhaps, I should even go as far as indexing small and medium sites differently? What are some worst case scenarios I should be prepared for if I plan to grow to 50K sites?

like image 898
mr1031011 Avatar asked Mar 01 '16 14:03

mr1031011


1 Answers

If you create one index with several mapping types, you will have a big constraint that requires you to make sure that no fields with the same name in two different mapping types have two different types, i.e. you can't have a field named blablaCount being a long in one mapping type and a double in another mapping type within the same index.

Your mileage may vary, but since ES 2.0 and the great mapping refactoring, it is usually recommended to go with several indices and one mapping type per index.

What I would do is to create several indices and one mapping/document type per index, then you'd simply group all indices belonging to a given business with an alias, so that if you need to query all indices of a given business, you can simply query the alias for that business.

Another option is to put all documents of all businesses in the same set of indices and simply discriminate each business using a term query on its businessId field, or even by routing on the businessId.

However, in your case, since each business doesn't have that many documents, it might be a waste of resource to create a full set of indices for each business, so I'd probably go with the second option, i.e. create a set of indices, each with its own mapping/document types and then store all documents from all business in those indices.

like image 116
Val Avatar answered Nov 08 '22 18:11

Val