Let say I have a thousand keys, and I would want to store the associated values. The intuitive approach seems to be something like
{
"key1":"someval",
"key2":"someotherval",
...
}
Is this a bad design pattern for elasticsearch index to have thousands of keys? Would each keys introduced this way create overhead for every documents under the index?
By default, Elasticsearch has a 1000 field max to prevent data explosion and creating millions of fields, but for our case, we need to get more than the allotted one thousand.
Though there is technically no limit to how much data you can store on a single shard, Elasticsearch recommends a soft upper limit of 50 GB per shard, which you can use as a general guideline that signals when it's time to start a new index.
max_content_length is set to 100MB, Elasticsearch will refuse to index any document that is larger than that.
If you know there is an upper limit to the number of keys you'll have, a few thousand fields is not a problem.
The problem is when you have an unbounded set of keys, e.g. when the key is derived from a value, as you'll have a continuously growing mapping and thus also cluster state. It can also lead to quirky searches.
This is a common enough question/issue that I dedicated a section to it in my article on Troubleshooting Elasticsearch searches, for Beginners.
In short, thousands of fields is no problem - not having control of the mapping is.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With