Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cassandra: Generate a unique ID?

I'm working on a distributed data base. I'm trying to generate a unique ID that will serve as a column family primary key in cassandra.

I read some articles about doing this with Java using UUID but it seems like there is a probability for collision (even if it's very low).

I wonder if there is a way to generate a unique ID based on time maybe?

like image 705
user2090879 Avatar asked Apr 18 '13 13:04

user2090879


People also ask

How do I get a unique id in Cassandra?

You need to use cassandra function now() to generate timeuuid and use uuid() function to generate uuid type string.

How do I use UUID in Cassandra?

The uuid() function is suitable for use in insert or update statements and uuid() function takes no parameter value to generate a unique random Type 4 UUID value which is guaranteed unique value. Let's take an example to understand the uuid() function. Create table function4(Id uuid primary key, name text);

What is UUID data type in Cassandra?

The UUID (universally unique id) comparator type for avoiding collisions in column names. The UUID (universally unique id) comparator type is used to avoid collisions in column names. Alternatively, you can use the timeuuid. Timeuuid types can be entered as integers for CQL input.

What is UUID generator?

A UUID (Universal Unique Identifier) is a 128-bit value used to uniquely identify an object or entity on the internet. Depending on the specific mechanisms used, a UUID is either guaranteed to be different or is, at least, extremely likely to be different from any other UUID generated until A.D. 3400.

Is it possible to generate random unique ID in Cassandra?

Instead generate the ID in the code or service in your app, which can keep generating random unique IDs and use that to apply on your data model, this way the objective & benefit of Cassandra will not be defeated This question is pretty old but I'd like to complete it with an other solution.

How to create an index in Cassandra?

The match_year column is a good option for an index. Syntax to create an Index: To create table used keyspace1 as a keyspace and Task as a table name. Lets have a look. As Cassandra is a distributed and decentralized database with the data organized by partition key, In general case, WHERE clause queries need to include a partition key.

What is the use of primary key in Cassandra?

For example, use it to define how to store the data on disk or whether to use compression. Every table in Cassandra needs to have a primary key, which makes a row unique. With primary keys, you determine which node stores the data and how it partitions it. Simple primary key.

How to create a timeuuid in Cassandra using Java?

You can use the TimeUUID type in Cassandra, which backs a Type 1 UUID. This uses the current time and the creator's MAC address and a sequence number. If the TimeUUID number is generated correctly this can be done with zero collisions (you can use the CQL now () method or insert your own, the java SDK's provide some thread-safe implementations).


4 Answers

You can use the TimeUUID type in Cassandra, which backs a Type 1 UUID. This uses the current time and the creator's MAC address and a sequence number. If the TimeUUID number is generated correctly this can be done with zero collisions (you can use the CQL now() method or insert your own, the java SDK's provide some thread-safe implementations). The main advantage of TimeUUIDs is that the IDs can be time ordered. See http://wiki.apache.org/cassandra/TimeBaseUUIDNotes for more info.

However, the time ordering is unlikely to be useful for row primary keys, since the ordering is useless when using a hash partitioner, though possible using a clustering key. And also the complexity of generating a unique ID could be a source of bugs if you roll your own. Cassandra also supports Type 4 UUIDs by using the UUID type. These are just random bits. There is a collision probability, but the collision probability (assuming uncorrelated random number sources, which it will be if you generate in Java) is extremely low - if you created 1 billion a second for 100 years the probability of one collision is about 50%. (See http://en.wikipedia.org/wiki/Universally_unique_identifier#Random_UUID_probability_of_duplicates for more details.)

like image 120
Richard Avatar answered Oct 19 '22 15:10

Richard


You should investigate using Twitter Snowflake. From the project readme:

As we at Twitter move away from Mysql towards Cassandra, we've needed a new way to generate id numbers. There is no sequential id generation facility in Cassandra, nor should there be.

Snowflake uses an intuitive algorithm that generates longs which are both time-ordered and unique. Since your database is distributed, this service should suit your needs well.

like image 26
noahlz Avatar answered Oct 19 '22 15:10

noahlz


As said by Richard you can use TimeUUID, and generating TimeUUID value is not a big deal. Just follow cassandra FAQ timeuuid.

like image 45
abhi Avatar answered Oct 19 '22 14:10

abhi


You need to use cassandra function now() to generate timeuuid and use uuid() function to generate uuid type string.

like image 38
Ajai Avatar answered Oct 19 '22 14:10

Ajai