We are using elasticsearch for the following usecase.
Elasticsearch Version : 5.1.1
Note: We are using AWS managed ElasticSearch
We have a multi-tenanted system where in each tenant stores data for multiple things and number of tenants will increase day by day.
exa: Each tenant will have following information.
1] tickets
2] sw_inventory
3] hw_inventory
Current indexing stratergy is as follows:
indexname:
tenant_id (GUID) exa: tenant_xx1234xx-5b6x-4982-889a-667a758499c8
types:
1] tickets
2] sw_inventory
3] hw_inventory
Issues we are facing:
1] Conflicts for mappings of common fields exa: (id,name,userId) in types ( tickets,sw_inventory,hw_inventory )
2] As the number of tenants are increasing number of indices can reach upto 1000 or 2000 also.
Will it be a good idea if we reverse the indexing stratergy ?
exa: index names :
1] tickets
2] sw_inventory
3] hw_inventory
types:
tenant_tenant_id1
tenant_tenant_id2
tenant_tenant_id3
tenant_tenant_id4
So there will be only 3 huge indices with N number of types as tenants.
So the question in this case is which solution is better?
1] Many small indices and 3 types
OR
2] 3 huge indices and many types
Regards
I suggest a different approach: https://www.elastic.co/guide/en/elasticsearch/guide/master/faking-it.html
Meaning custom routing where each document has a tenant_id
or similar (something that is unique to each tenant) and use that both for routing and for defining an alias for each tenant. Then, when querying documents only for a specific tenant, you use the alias.
You are going to use one index and one type this way. Depending on the size of the index, you consider the existing index size and number of nodes and try to come up with a number of shards in such way that they are split evenly more or less on all data holding nodes and, also, following your tests the performance is acceptable. IF, in the future, the index grows too large and shards become too large to keep the same performance, consider creating a new index with more primary shards and reindex everything in that new one. It's not an approach unheard of or not used or not recommended.
1000-2000 aliases is nothing in terms of capability of being handled. If you have close to 10 nodes, or more than 10, I also do recommend dedicated master nodes with something like 4-6GB heap size and at least 4CPU cores.
Neither approach would work. As others have mentioned, both approaches cost performance and would prevent you from upgrading.
Consider having one index and type for each set of data, e.g. sw_inventory
and then having a field within the mapping that differentiates between each tenant. You can then utilize document level security in a security plugin like X-Pack or Search Guard to prevent one tenant from seeing another's records (if required).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With