Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cassandra UUID vs TimeUUID benefits and disadvantages

Given that TimeUUID handily allows you to use now() in CQL, are there any reasons you wouldn't just go ahead and always use TimeUUID instead of plain old UUID?

like image 779
Jay Avatar asked Jul 30 '13 11:07

Jay


People also ask

Why do we use UUID in Cassandra to uniquely identify records?

One of the reason of using uuid() function to generate Unique ID which helps in avoiding collisions. The uuid() function is suitable for use in insert or update statements and uuid() function takes no parameter value to generate a unique random Type 4 UUID value which is guaranteed unique value.

Can Cassandra generate UUID?

Apache Cassandra™ includes the uuid() function. This function takes no parameters and generates a random Type 4 UUID suitable for use in INSERT or SET statements. Used in a SELECT clause, this function extracts the timestamp of a timeuuid column in a resultset. This function returns the extracted timestamp as a date.

What is UUID data type in Cassandra?

The UUID (universally unique id) comparator type for avoiding collisions in column names. The UUID (universally unique id) comparator type is used to avoid collisions in column names. Alternatively, you can use the timeuuid. Timeuuid types can be entered as integers for CQL input.

What is Timeuuid?

TIMEUUID is a universal unique identifier variant that includes time information.


1 Answers

UUID and TIMEUUID are stored the same way in Cassandra, and they only really represent two different sorting implementations.

TIMEUUID columns are sorted by their time components first, and then by their raw bytes, whereas UUID columns are sorted by their version first, then if both are version 1 by their time component, and finally by their raw bytes. Curiosly the time component sorting implementations are duplicated between UUIDType and TimeUUIDType in the Cassandra code, except for different formatting.

I think of the UUID vs. TIMEUUID question primarily as documentation: if you choose TIMEUUID you're saying that you're storing things in chronological order, and that these things can occur at the same time, so a simple timestamp isn't enough. Using UUID says that you don't care about order (even if in practice the columns will be ordered by time if you put version 1 UUIDs in them), you just want to make sure that things have unique IDs.

Even if using NOW() to generate UUID values is convenient, it's also very surprising to other people reading your code.

It probably does not matter much in the grand scheme of things, but sorting non-version 1 UUIDs is a bit faster than version 1, so if you have a UUID column and generate the UUIDs yourself, go for another version.

like image 196
Theo Avatar answered Sep 21 '22 15:09

Theo