I'm trying to understand why auto-increment pattern is bad when scaling.
I've also read this article. There are such words in it:
I'm trying to find out the exact scenario of circumstances with _id duplication in shards.
And another one question: what about auto-increment for non-primary key? Is it safe?
Thank you very much!
In order to guarantee that an auto-increment value is unique, the ID creation must occur on a single thread on a single host (even if multiple threads are used, the point of ID creation must block other threads). So, in a cluster of 100 servers, IDs must be created on 1 thread on 1 out the 100 servers. This not just a performance bottleneck, it is possible that the creation of 2 auto-increment IDs might block each other, which is the race condition noted in the quotation you've cited.
It should be noted that transactional RDBMS systems like Oracle and SQL Server have solved the race condition problem, but there is no solution to the performance bottleneck.
So: no, don't use auto-increment in non-primary keys if you anticipate the need to scale your system.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With