Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ElasticSearch : More indices vs More types

We are using elasticsearch for the following usecase.
Elasticsearch Version : 5.1.1
Note: We are using AWS managed ElasticSearch

We have a multi-tenanted system where in each tenant stores data for multiple things and number of tenants will increase day by day.

exa: Each tenant will have following information.

1] tickets
2] sw_inventory
3] hw_inventory

Current indexing stratergy is as follows:

indexname:
tenant_id (GUID) exa: tenant_xx1234xx-5b6x-4982-889a-667a758499c8

types:

1] tickets
2] sw_inventory
3] hw_inventory

Issues we are facing:

1] Conflicts for mappings of common fields exa: (id,name,userId) in types ( tickets,sw_inventory,hw_inventory )
2] As the number of tenants are increasing number of indices can reach upto 1000 or 2000 also.

Will it be a good idea if we reverse the indexing stratergy ?

exa: index names :

1] tickets
2] sw_inventory
3] hw_inventory

types:

tenant_tenant_id1
tenant_tenant_id2
tenant_tenant_id3
tenant_tenant_id4

So there will be only 3 huge indices with N number of types as tenants.

So the question in this case is which solution is better?

1] Many small indices and 3 types
OR
2] 3 huge indices and many types

Regards

like image 204
SSG Avatar asked Jan 02 '18 16:01

SSG


2 Answers

I suggest a different approach: https://www.elastic.co/guide/en/elasticsearch/guide/master/faking-it.html

Meaning custom routing where each document has a tenant_id or similar (something that is unique to each tenant) and use that both for routing and for defining an alias for each tenant. Then, when querying documents only for a specific tenant, you use the alias.

You are going to use one index and one type this way. Depending on the size of the index, you consider the existing index size and number of nodes and try to come up with a number of shards in such way that they are split evenly more or less on all data holding nodes and, also, following your tests the performance is acceptable. IF, in the future, the index grows too large and shards become too large to keep the same performance, consider creating a new index with more primary shards and reindex everything in that new one. It's not an approach unheard of or not used or not recommended.

1000-2000 aliases is nothing in terms of capability of being handled. If you have close to 10 nodes, or more than 10, I also do recommend dedicated master nodes with something like 4-6GB heap size and at least 4CPU cores.

like image 76
Andrei Stefan Avatar answered Sep 21 '22 14:09

Andrei Stefan


Neither approach would work. As others have mentioned, both approaches cost performance and would prevent you from upgrading.

Consider having one index and type for each set of data, e.g. sw_inventory and then having a field within the mapping that differentiates between each tenant. You can then utilize document level security in a security plugin like X-Pack or Search Guard to prevent one tenant from seeing another's records (if required).

like image 29
ryanlutgen Avatar answered Sep 19 '22 14:09

ryanlutgen