Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sequential Guid and fragmentation

I'm trying to understand how sequential guid performs better than a regular guid.

Is it because with regular guid, the index use the last byte of the guid to sort? Since it's random it will cause alot of fragmentation and page splits since it will often move data to another page to insert new data?

Sequential guid sine it is sequential it will cause alot less page splits and fragmentation?

Is my understanding correct?

If anyone can shed more lights on the subject, I'll appreciated very much.

Thank you

EDIT:

Sequential guid = NEWSEQUENTIALID(),

Regular guid = NEWID()

like image 768
pdiddy Avatar asked Aug 10 '10 14:08

pdiddy


People also ask

Is guid sequential?

Sequential GUIDs are not actually sequential. In normal circumstances, GUIDs being generated by the same computer will have gradually increasing Timestamp fields (with the other fields remaining constant).

Should you use GUID as primary key?

GUIDs can be considered as global primary keys. Local primary keys are used to uniquely identify records within a table. On the other hand, GUIDs can be used to uniquely identify records across tables, databases, and servers.

What is comb GUID?

The COMB GUID (with embed date and time) becomes a sequential GUID, with each GUID being sequentially after the previous GUID. This works great for indexing and sorting. But you cant just replace or insert this time stamp anywhere in the GUID, any part of the GUID with the date/time, it depends on the Database Server.


2 Answers

You've pretty much said it all in your question.

With a sequential GUID / primary key new rows will be added together at the end of the table, which makes things nice an easy for SQL server. In comparison a random primary key means that new records could be inserted anywhere in the table - the chance of the last page for the table being in the cache is fairly likely (if that's where all of the reads are going), however the chance of a random page in the middle of the table being in the cache is fairly low, meaning additional IO is required.

On top of that, when inserting rows into the middle of the table there is the chance that there isn't enough room to insert the extra row. If this is the case then SQL server needs to perform additional expensive IO operations in order to create room for the record - the only way to avoid this is to have gaps scattered amongst the data to allow for extra records to be inserted (known as a Fill factor), which in itself causes performance issues because the data is spread over more pages and so more IO is required to access the entire table.

like image 94
Justin Avatar answered Sep 29 '22 21:09

Justin


I defer to Kimberly L. Tripp's wisdom on this topic:

But, a GUID that is not sequential - like one that has it's values generated in the client (using .NET) OR generated by the newid() function (in SQL Server) can be a horribly bad choice - primarily because of the fragmentation that it creates in the base table but also because of its size. It's unnecessarily wide (it's 4 times wider than an int-based identity - which can give you 2 billion (really, 4 billion) unique rows). And, if you need more than 2 billion you can always go with a bigint (8-byte int) and get 263-1 rows.

Read more: http://www.sqlskills.com/BLOGS/KIMBERLY/post/GUIDs-as-PRIMARY-KEYs-andor-the-clustering-key.aspx#ixzz0wDK6cece

like image 31
Joe Stefanelli Avatar answered Sep 29 '22 20:09

Joe Stefanelli