Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are index aliases and wildcard index endpoints in Elasticsearch exactly the same thing?

The Elasticsearch documentation for Index Aliases says:

The index aliases API allow to alias an index with a name, with all APIs automatically converting the alias name to the actual index name. An alias can also be mapped to more than one index, and when specifying it, the alias will automatically expand to the aliases indices.

And the documentation for Multiple Indices says:

Most APIs that refer to an index parameter support execution across multiple indices, using simple test1,test2,test3 notation (or _all for all indices). It also support wildcards, for example: test*, and the ability to "add" (+) and "remove" (-), for example: +test*,-test3.

Scenario #1

  1. You have 12 monthly indices from the year 2014 each named with a date pattern, e.g. someprefix_2014-07

  2. You map all of these indices to an alias named 2014.

  3. Both of these requests would return the same result:

    • $ curl -XGET http://localhost:9200/someprefix_2014-*/_stats

    • $ curl -XGET http://localhost:9200/2014/_stats

Scenario #2

  1. You have a total of 24 monthly indices in your cluster and you decide you want to target all of them.

  2. All of these requests would return the same result:

    • $ curl -XGET http://localhost:9200/_stats

    • $ curl -XGET http://localhost:9200/_all/_stats

    • $ curl -XGET http://localhost:9200/*/_stats

    • $ curl -XGET http://localhost:9200/someprefix_*/_stats

My Question

Are all of these methods doing the same thing "under the hood", or is there one that may expect better performance than the others?

I ask because I've read about Wildcard Queries being a common performance bottleneck, but I've never seen any similar warning for using aliases or wildcards in index endpoints - or distinguishing default aliases (like _all) from custom ones.

like image 672
Frankie Jarrett Avatar asked Apr 01 '15 21:04

Frankie Jarrett


People also ask

What is alias for index in Elasticsearch?

An alias is a secondary name for a group of data streams or indices. Most Elasticsearch APIs accept an alias in place of a data stream or index name. You can change the data streams or indices of an alias at any time.

What is wildcard search in Elasticsearch?

A wildcard operator is a placeholder that matches one or more characters. For example, the * wildcard operator matches zero or more characters. You can combine wildcard operators with other characters to create a wildcard pattern.

How do I create an alias in Elasticsearch?

In Elasticsearch, an alias is a secondary name given that refers to a group of data streams or indices. Aliases can be created and removed dynamically using _aliases REST endpoint. There are two types of aliases: Data Stream Aliases: An alias for a data stream refers to one or more data streams.

How do I delete alias in Elasticsearch?

To remove all aliases, use * or _all . (Required, string) Comma-separated list of data streams or indices used to limit the request.


1 Answers

They aren't exactly the same, from a code execution perspective. But they are functionally identical and will have identical performance profiles.

Aliases are really just "tags" that are attached to existing indices. So when you search against the 2014 alias, Elasticsearch just scans through the list of indices in the cluster state and finds all indices that are tagged with that alias.

When you search against a wildcard index pattern, it scans through the list of indices to see which names match the regex.

So performance will basically be the same, because the actual search is entirely unaffected: the shards associated with those searches will be queried no matter what, and all the index-to-shard lookups will happen on the coordinating node very quickly, no matter the method used.

So don't worry, you can choose whichever makes more sense for you :)

PS. Wildcard queries are discouraged because they do have performance implications. They have to generate and check a large number of potential tokens, which can have non-negligible impact on latency. But they are very different from index wildcards, or many other wildcards around ES. Most things that support pattern matching / wildcards in ES are simply Java regex, whereas the wildcard query is fancy automaton magic inside of Lucene against inverted indices...much different :)

like image 110
Zach Avatar answered Nov 05 '22 11:11

Zach