Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

The differences between INT and UUID in MySQL

If I set the primary key to be INT type (AUTO_INCREMENT) or set it in UUID, what is the difference between these two in the database performance (SELECT, INSERT etc) and why?

like image 984
孙为强 Avatar asked May 26 '15 14:05

孙为强


People also ask

What is UUID in MySQL?

In MySQL, a UUID value is a 128-bit number represented as a utf8 string, and the format in hexadecimal number will be as follows. Example – aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee. Here, the first three numbers are generated from the low, middle, and high parts of a timestamp.

Should I use UUID in MySQL?

UUID values in MySQL are unique across tables, databases, and servers. It allows us to merge rows from distributed/different databases across servers. UUID values do not provide information about our data, which means it is hard to guess. Thus it is safe to use in URLs.

Should I use UUID as primary key?

Pros. Using UUID for a primary key brings the following advantages: UUID values are unique across tables, databases, and even servers that allow you to merge rows from different databases or distribute databases across servers. UUID values do not expose the information about your data so they are safer to use in a URL.

Why is UUID better than auto increment?

UUID always occupies 16 bytes. For Auto Increment Integer, when stored as in long format, it occupies 8 bytes. If the table itself has only a few columns, the extra primary key space overhead will become more significant.


1 Answers

UUID returns a universal unique identifier (hopefuly also unique if imported to another DB as well).

To quote from MySQL doc (emphasis mine):

A UUID is designed as a number that is globally unique in space and time. Two calls to UUID() are expected to generate two different values, even if these calls are performed on two separate computers that are not connected to each other.

On the other hand a simply INT primary id key (e.g. AUTO_INCREMENT) will return a unique integer for the specific DB and DB table, but which is not universally unique (so if imported to another DB chances are there will be primary key conflicts).

In terms of performance, there shouldn't be any noticeable difference using auto-increment over UUID. Most posts (including some by the authors of this site), state as such. Of course UUID may take a little more time (and space), but this is not a performance bottleneck for most (if not all) cases. Having a column as Primary Key should make both choices equal wrt to performance. See references below:

  1. To UUID or not to UUID?
  2. Myths, GUID vs Autoincrement
  3. Performance: UUID vs auto-increment in cakephp-mysql
  4. UUID performance in MySQL?
  5. Primary Keys: IDs versus GUIDs (coding horror)

(UUID vs auto-increment performance results, adapted from Myths, GUID vs Autoincrement)

enter image description here

UUID pros / cons (adapted from Primary Keys: IDs versus GUIDs)

GUID Pros

  • Unique across every table, every database, every server
  • Allows easy merging of records from different databases
  • Allows easy distribution of databases across multiple servers
  • You can generate IDs anywhere, instead of having to roundtrip to the database
  • Most replication scenarios require GUID columns anyway

GUID Cons

  • It is a whopping 4 times larger than the traditional 4-byte index value; this can have serious performance and storage implications if you're not careful
  • Cumbersome to debug (where userid='{BAE7DF4-DDF-3RG-5TY3E3RF456AS10}')
  • The generated GUIDs should be partially sequential for best performance (eg, newsequentialid() on SQL 2005) and to enable use of clustered indexes.

Note

I would read carefully the mentioned references and decide whether to use UUID or not depending on my use case. That said, in many cases UUIDs would be indeed preferable. For example one can generate UUIDs without using/accessing the database at all, or even use UUIDs which have been pre-computed and/or stored somewhere else. Plus you can easily generalise/update your database schema and/or clustering scheme without having to worry about IDs breaking and causing conflicts.

In terms of possible collisions, for example using v4 UUIDS (random), the probability to find a duplicate within 103 trillion version-4 UUIDs is one in a billion.

like image 95
Nikos M. Avatar answered Oct 04 '22 14:10

Nikos M.