we are using custom routing and ended up with a single shard that is over 100gb and I'd like to know which routing value is causing this imbalance...
I can see the shards like this...
GET /_cat/shards
my-index 2 p STARTED 10108264 131.5gb
my-index 3 p STARTED 270403 1.7gb
my-index 1 p STARTED 187303 1.5gb
my-index 0 p STARTED 321519 2.5gb
and see the shard details like this..
GET /my-index/_search_shards
and even see shard info for random documents like this...
GET /my-index/_search
{
"explain": true
}
but how can I search for documents in a specific shard (shard #2 in my case)?
You can use the search API to search and aggregate data stored in Elasticsearch data streams or indices. The API’s query request body parameter accepts queries written in Query DSL. The following request searches my-index-000001 using a match query. This query matches documents with a user.id value of kimchy.
This value is then passed through a hashing function, which generates a number that can be used for the division. The remainder of dividing the generated number with the number of primary shards in the index, will give the shard number. This is how Elasticsearch determines the location of specific documents.
You can use the search API to search and aggregate data stored in Elasticsearch data streams or indices. The API’s query request body parameter accepts queries written in Query DSL. The following request searches my-index-000001 using a match query.
Elasticsearch automatically manages and balances how the shards are arranged in the nodes. Elasticsearch also automatically creates one (1) primary shard and one (1) replica for every index. We recommend that our clients have one (1) replica in every production cluster as a backup.
Should be able to do that using the preference option:
Example:
GET my-index/_search?preference=_shards:2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With