I'm using AWS-provided Elastic Search.
I have a signup page on my website, and on each signup; a new index for the new user gets created (to be used later by his work-group), which means that the number of indexes is continuously growing, (now it reached around 4~5k).
My question is: is there a performance limit on the number of indexes? is it safe (performance-wise) to keep creating new indexes dynamically with each new user?
An Elasticsearch index with two shards is conceptually exactly the same as two Elasticsearch indexes with one shard each. The difference is largely the convenience Elasticsearch provides via its routing feature, which we will get back to in the next section. This insight is important for several reasons.
I've never heard of a "limit", suggested or otherwise, as to the maximum number of indexes per table. It is true that each index will cause an INSERT/DELETE and maybe an UPDATE (if you update the indexed columns) to run slower. Now, suppose T is a very very very large table. Suppose Y is a selective column. Suppose Y is not indexed.
If you are unfamiliar with how Elasticsearch interacts with Lucene on the shard level, Elasticsearch from the Bottom Up is worth a read. Since the nomenclature can be a bit ambiguous, we'll make it clear whether we are discussing a Lucene or an Elasticsearch index.
The atomic scaling unit for an Elasticsearch index is the shard. A shard is actually a complete Lucene index. If you are unfamiliar with how Elasticsearch interacts with Lucene on the shard level, Elasticsearch from the Bottom Up is worth a read.
Note: I haven't used AWS-Elasticsearch, so this answer may vary because they have started using open-distro of Elsticsearch and have forked the main branch. But a lot of principles should be the same. Also, this question doesn't have a definitive answer and it depends on various factors but I hope this answer will help the thought process.
One of the factors is the number of shards and replicas per index as that will contribute to the total number of shards per node. Each shard consumes some memory, so you will have to keep the number of shards limited per node so that they don't exceed maximum recommended 30GB heap space. As per this comment 600 to 1000
should be reasonable and you can scale your cluster according to that.
Also, you have to monitor the number of file descriptors and make sure that doesn't create any bottleneck for nodes to operate.
HTH!
If I'm not mistaken, the only limit is the disk space of your server, but if your index is growing too fast you should think about having more replica servers. I recomend reading this page: Indexing Performance Tips
Indexes themselves have no limit, however shards do, the recommended amount of shards per GB of heap is 20(JVM heap - you can check on kibana stack monitoring tab), this means if you have 5GB of JVM heap, the recommended amount is 100.
Remember that 1 index can take from 1 to x number of shards (1 primary and x secondary), normally people have 1 primary and 1 secondary, if this is you case then you would be able to create 50 indexes with those 5GB of heap
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With