Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are IDs guaranteed to be unique across indices in Elasticsearch 6+?

With mapping types being removed in Elasticsearch 6.0 I wonder if IDs of documents are guaranteed to be unique across indices?

Say I have three indices, all with a "parent" field that contains an ID. Do I need to include which index the ID belongs to or can I just search through all three indices when looking for a document with the given ID?

like image 301
Oskar Persson Avatar asked Feb 07 '18 12:02

Oskar Persson


Video Answer


2 Answers

IDs are not unique across indices. If you want to refer to a document you need to know both the index name and the ID.

like image 111
Tim Avatar answered Nov 15 '22 09:11

Tim


Explicit IDs

If you explicitly set the document ID when indexing, nothing prevents you from using the same ID twice for documents going in different indices.

Autogenerated IDs

If you don't set the ID when indexing, ES will generate one before storing the document.
According to the code, the ID is securely generated from a random number, the host MAC address and the current timestamp in ms. Additional work is done to ensure that the timestamp (and thus the ID sequence) increases monotonically.

To generate the same ID, when the JVM starts a specific random number has to be picked and the document ID must be generated in a specific moment with sub-millisecond precision. So while the chance exists, it's so small that I wouldn't care about it. (just like I wouldn't care about collisions when using an hash function to check file integrity)

Final note: as a code comment notes, the implementation is opaque and could change at any time, so what I wrote might not hold true in future versions.

like image 38
sox with Monica Avatar answered Nov 15 '22 10:11

sox with Monica