Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sequential UID set generation for MySQL Char() or other Field

Tried Googling but:

Question: Best way to externally generate Sequential UID values for a MySQL field which must be representable as a string.

Reason:
Generic sequential UUID-ish values for on-disk-order/page-appending inserts for performance of writes and date prefixing for read speed when searching an index of the field from char[0] forward. The column will be indexed, but looking for the best data to increase index read and table write performance rather than a plain-old-UUID.

My initial thought is date to some granularity (possibly padded epoch) appended to or replacing some portion of a UUIDv4 generated string ie [Unix epoch][remaining UUID4] in a fixed-width char field, but I am unsure if this would have the desired in-page/disk ordering result and index-searching result. An example would be:

12904645950049bceba1cc24e80806dd

The values must be independent of MySQL itself, hence using UUIDs and timestamps rather than some variation of auto-incrementing.

Anyone who knows the internals of MySQL indexes have any suggestions (for InnoDB Tables) ?

Aiden

like image 707
Aiden Bell Avatar asked Nov 16 '10 15:11

Aiden Bell


2 Answers

Might be a bit offtopic, but have a look at Twitter's snowflake. They say it's:

  • (Roughly) Time Ordered (helps a lot to avoid expensive random primary key BTREE updates)
  • Directly Sortable
  • Compact

Not to mention other features (HA, etc.). You can either nick their algorithm or just use it as it stands.

The whole UID only uses up to 64 bits of space so I would guess it would be quite effective to index - see http://www.mysqlperformanceblog.com/2006/10/03/long-primary-key-for-innodb-tables/ (a counter example).

like image 112
mindas Avatar answered Sep 21 '22 04:09

mindas


I think you may need to be more specific with what you are trying to solve (what's the actual problem - why not auto_increment?, what is your proposed schema?, etc.). To answer your internals question:

  • InnoDB stores data in an index (the clustered index), in 16K pages.

The risks of not inserting sequentially are at least two fold:

  1. If you do not have memory fit, you may need to do random IO to load a page from disk to insert the value to that page.

  2. There might not be space remaining in the page (InnoDB fills 93% and leaves a small gap for updates), which could result in the page needing to be split. More split pages = fragmentation / less optimal use of things such as memory.

So, I think as long as you are approximately sequential at least (1) isn't a concern for the primary key index (could still be true for any unique indexes). You just need to be worried about (2).


Why I said that understanding the problem is important, is that there is so many ways to do this besides long GUIDs. For one, a BIGINT in MySQL is smaller than any data type you will probably be using, but has a range of 18 quintillion. You could allocate "chunks" of key space N thousand at a time to worker nodes and guarantee no duplicates. If a worker node crashes and doesn't use all the chunk it was allocated, so what. It doesn't matter.

like image 24
Morgan Tocker Avatar answered Sep 21 '22 04:09

Morgan Tocker