The Elasticsearch documentation for Index Aliases says:
The index aliases API allow to alias an index with a name, with all APIs automatically converting the alias name to the actual index name. An alias can also be mapped to more than one index, and when specifying it, the alias will automatically expand to the aliases indices.
And the documentation for Multiple Indices says:
Most APIs that refer to an
index
parameter support execution across multiple indices, using simpletest1,test2,test3
notation (or_all
for all indices). It also support wildcards, for example:test*
, and the ability to "add" (+
) and "remove" (-
), for example:+test*,-test3
.
Scenario #1
You have 12 monthly indices from the year 2014 each named with a date pattern, e.g. someprefix_2014-07
You map all of these indices to an alias named 2014
.
Both of these requests would return the same result:
$ curl -XGET http://localhost:9200/someprefix_2014-*/_stats
$ curl -XGET http://localhost:9200/2014/_stats
Scenario #2
You have a total of 24 monthly indices in your cluster and you decide you want to target all of them.
All of these requests would return the same result:
$ curl -XGET http://localhost:9200/_stats
$ curl -XGET http://localhost:9200/_all/_stats
$ curl -XGET http://localhost:9200/*/_stats
$ curl -XGET http://localhost:9200/someprefix_*/_stats
My Question
Are all of these methods doing the same thing "under the hood", or is there one that may expect better performance than the others?
I ask because I've read about Wildcard Queries being a common performance bottleneck, but I've never seen any similar warning for using aliases or wildcards in index endpoints - or distinguishing default aliases (like _all
) from custom ones.
An alias is a secondary name for a group of data streams or indices. Most Elasticsearch APIs accept an alias in place of a data stream or index name. You can change the data streams or indices of an alias at any time.
A wildcard operator is a placeholder that matches one or more characters. For example, the * wildcard operator matches zero or more characters. You can combine wildcard operators with other characters to create a wildcard pattern.
In Elasticsearch, an alias is a secondary name given that refers to a group of data streams or indices. Aliases can be created and removed dynamically using _aliases REST endpoint. There are two types of aliases: Data Stream Aliases: An alias for a data stream refers to one or more data streams.
To remove all aliases, use * or _all . (Required, string) Comma-separated list of data streams or indices used to limit the request.
They aren't exactly the same, from a code execution perspective. But they are functionally identical and will have identical performance profiles.
Aliases are really just "tags" that are attached to existing indices. So when you search against the 2014
alias, Elasticsearch just scans through the list of indices in the cluster state and finds all indices that are tagged with that alias.
When you search against a wildcard index pattern, it scans through the list of indices to see which names match the regex.
So performance will basically be the same, because the actual search is entirely unaffected: the shards associated with those searches will be queried no matter what, and all the index-to-shard lookups will happen on the coordinating node very quickly, no matter the method used.
So don't worry, you can choose whichever makes more sense for you :)
PS. Wildcard queries are discouraged because they do have performance implications. They have to generate and check a large number of potential tokens, which can have non-negligible impact on latency. But they are very different from index wildcards, or many other wildcards around ES. Most things that support pattern matching / wildcards in ES are simply Java regex, whereas the wildcard
query is fancy automaton magic inside of Lucene against inverted indices...much different :)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With