I know this type of question has been asked before, but I could not find one that compared the options I have in mind. So I am going to post them here, please post links if there are duplicates. This has ended up a rather long post, if you have time please read it through as the question is at the end EDIT2: I've accepted an answer as I think that will be the best solution for now. But I thought I would like to two other questions that answer my query about concatenating numbers. They can be found here: Combine two integers to create a unique number & Concatenate integers in C#. If I was going to try encoding the number (as below like 51122222) I think this would be useful. Though maybe just using something like String.Format in c# would be fast enough for my small application. I'm currently trying to find a way to setup distributed applications that use the same database schema and can synchronise with maybe one master database that all others also sync with. The program I am planning currently will start as a fairly simple program to track information. The first version might contain two tables: Items and ItemHistory. This is an example of possible fields: Items ItemID(PK) ? Name String Content String ItemHistory ItemHistoryID (PK) ? ItemID (FK) ? EventName String CreatedOn DateTime I've listed the field name and type, this question is about what to use for the PK types so they are missing. The first version will be a standard desktop app, I'm currently planning on using C# with a WPF front end and SQLite for the database. Eventually I also want to create a version to run on my Android phone as well. This is where the distributed part comes in. I don't always have a signal so will need the app to run offline and allow synchronisation when online again. Here are the ideas I have so far on how to deal with the ID's: <ol> <li>Use a UUID for the IDs so there are no merge conflicts </li> <li>Use a auto increment field and set the starting number on each version of the app in some increment, e.g. 1 for first app, 10000 for second, 20000 for third etc</li> <li>Use a auto increment field with an offset value to avoid conflicts without the large gaps between numbers (mysql has auto_increment_increment and auto_increment_offset for this)</li> <li>Generate my own ID that encodes an ID for each database so they can have their own auto increment value and not cause a conflict. I found someone else that had the same idea: What data type is recommended for ID columns? </li> </ol> While option 1 would work and I have used it in the past I want to look at the possibility of other options to avoid the issues with UUIDs. I would like to have a solution that is easier to read while debugging and is sortable. Option 2 would work but it does force a limit on the number of records. I know in my small application it will almost never go over that many but I would like to try and see if there's a solution that does not require such a limit. Option 3 avoids the limit by using alternating numbers, but I think you would need to know how many database are to be used or you might fill all numbers otherwise. Using a start of 1 and increment of 1 on DB1 and start of 2 and increment of 2 on DB2 would use every number alternatively. You could use 50 as the increment but then you just have another limit but now its on the number of applications that can use it. Again I know its a limit that is not going to be hit in my situation but could be an issue in an application that suddenly becomes very popular. Option 4 seems like it could solve the issue for me, but I'm not sure if it would work in practice or not. One idea I had was to allow a prefix to be set on each application then that could be used with an auto incrementing value. e.g. PC1, PC2 for records on a pc and maybe PHONE1, PHONE2 etc for records from the Android. This would work but using numbers in strings causes the sorting issue with 1, 11, 100 showing up next to each other, that is in less leading zeros are used and then its back to a limited number of records again. I have wondered if it would be possible to use a number for the DB ID and the auto increment. e.g PC = 1 and PHONE = 2. then we have 11, 12, 13 etc for the PC with maybe 111 for the 11th record and 2304 for the 304th record on PHONE. But I don't know how this would be done or if it can easily be done and not cause excess overheads for generating values. At work they have used a similar numbering system, they use something like this 51122222. 5 would refer to the instance of the application, then its a 2 digit year and finally a auto incrementing number. I've not got a clear answer yet what happens if we go over 99999 records in a year. I think they might have figured that its not going to happen and are happy they have calculated the risk. So finally a question, is there a way to create a primary key system for a distributed application that allows for sorting and does not enforce limits (besides the size of the data type itself e.g. max integer)? EDIT: Here's a little bit more info on the app I plan to write. I want to create something that will let me store just about any type of information that I might gain, the system will include the ability to tag the entries so I can search on a topic. Types of information I see so far could be recommendations on books, dvds, websites etc. Or maybe local tips for the place I'm living. One overall idea is to stop keeping these bits of information spread across multiple computers/laptops/phones in different formats.

in broad terms, there are two approaches. <ol> <li>you use sequential values. these may be divided up into groups, interleaved, whatever. they are the most efficient approach, but require collaboration and coordination.</li> <li>you use random values (this includes UIDs). these are much simpler but require more space. from "birthday collisions" we know that you if you need to store N values then a random key must be chosen from (more than) a range of N*N - http://en.wikipedia.org/wiki/Birthday_problem. working backwards, a 64 bit integer can hold about 32 bits of data if used as a random key - that's about 4 billion values. but that's for a probability of 50% collisions. you want a much lower probability, so a practical limit is around 10 million entries.</li> </ol> so, in simple terms, if you have a 64 bit key, a random approach would work for around 10 million entries a sequential approach for many more. in either case, that is probably more than you need. if you have a 32 bit key then a random approach works for around a thousand values (a sequential approach goes to about 4 billion, as above). obviously if you have a text value then you need to modify this accordingly, but UUIDs are designed to have "enough" values anyway http://en.wikipedia.org/wiki/Universally_unique_identifier typically a database will provide a sequential ID and that is all you need. if not, the 64 bit random approach is usually simplest and worth the extra space.

Distributed primary key - UUID, simple auto increment or custom sequential values?

Tags:

database

database-agnostic

distributed

I know this type of question has been asked before, but I could not find one that compared the options I have in mind. So I am going to post them here, please post links if there are duplicates.

This has ended up a rather long post, if you have time please read it through as the question is at the end

EDIT2: I've accepted an answer as I think that will be the best solution for now. But I thought I would like to two other questions that answer my query about concatenating numbers. They can be found here: Combine two integers to create a unique number & Concatenate integers in C#. If I was going to try encoding the number (as below like 51122222) I think this would be useful. Though maybe just using something like String.Format in c# would be fast enough for my small application.

I'm currently trying to find a way to setup distributed applications that use the same database schema and can synchronise with maybe one master database that all others also sync with.

The program I am planning currently will start as a fairly simple program to track information. The first version might contain two tables: Items and ItemHistory. This is an example of possible fields:

Items
ItemID(PK) ?
Name String
Content String

ItemHistory
ItemHistoryID (PK) ?
ItemID (FK) ?
EventName String
CreatedOn DateTime

I've listed the field name and type, this question is about what to use for the PK types so they are missing.

The first version will be a standard desktop app, I'm currently planning on using C# with a WPF front end and SQLite for the database. Eventually I also want to create a version to run on my Android phone as well. This is where the distributed part comes in. I don't always have a signal so will need the app to run offline and allow synchronisation when online again.

Here are the ideas I have so far on how to deal with the ID's:

Use a UUID for the IDs so there are no merge conflicts
Use a auto increment field and set the starting number on each version of the app in some increment, e.g. 1 for first app, 10000 for second, 20000 for third etc
Use a auto increment field with an offset value to avoid conflicts without the large gaps between numbers (mysql has auto_increment_increment and auto_increment_offset for this)
Generate my own ID that encodes an ID for each database so they can have their own auto increment value and not cause a conflict. I found someone else that had the same idea: What data type is recommended for ID columns?

While option 1 would work and I have used it in the past I want to look at the possibility of other options to avoid the issues with UUIDs. I would like to have a solution that is easier to read while debugging and is sortable.

Option 2 would work but it does force a limit on the number of records. I know in my small application it will almost never go over that many but I would like to try and see if there's a solution that does not require such a limit. Option 3 avoids the limit by using alternating numbers, but I think you would need to know how many database are to be used or you might fill all numbers otherwise. Using a start of 1 and increment of 1 on DB1 and start of 2 and increment of 2 on DB2 would use every number alternatively. You could use 50 as the increment but then you just have another limit but now its on the number of applications that can use it. Again I know its a limit that is not going to be hit in my situation but could be an issue in an application that suddenly becomes very popular.

Option 4 seems like it could solve the issue for me, but I'm not sure if it would work in practice or not. One idea I had was to allow a prefix to be set on each application then that could be used with an auto incrementing value. e.g. PC1, PC2 for records on a pc and maybe PHONE1, PHONE2 etc for records from the Android. This would work but using numbers in strings causes the sorting issue with 1, 11, 100 showing up next to each other, that is in less leading zeros are used and then its back to a limited number of records again.

I have wondered if it would be possible to use a number for the DB ID and the auto increment. e.g PC = 1 and PHONE = 2. then we have 11, 12, 13 etc for the PC with maybe 111 for the 11th record and 2304 for the 304th record on PHONE. But I don't know how this would be done or if it can easily be done and not cause excess overheads for generating values.

At work they have used a similar numbering system, they use something like this 51122222. 5 would refer to the instance of the application, then its a 2 digit year and finally a auto incrementing number. I've not got a clear answer yet what happens if we go over 99999 records in a year. I think they might have figured that its not going to happen and are happy they have calculated the risk.

So finally a question, is there a way to create a primary key system for a distributed application that allows for sorting and does not enforce limits (besides the size of the data type itself e.g. max integer)?

EDIT: Here's a little bit more info on the app I plan to write. I want to create something that will let me store just about any type of information that I might gain, the system will include the ability to tag the entries so I can search on a topic. Types of information I see so far could be recommendations on books, dvds, websites etc. Or maybe local tips for the place I'm living. One overall idea is to stop keeping these bits of information spread across multiple computers/laptops/phones in different formats.

622

asked Aug 31 '11 14:08

Kioshiki

1 Answers

in broad terms, there are two approaches.

you use sequential values. these may be divided up into groups, interleaved, whatever. they are the most efficient approach, but require collaboration and coordination.
you use random values (this includes UIDs). these are much simpler but require more space. from "birthday collisions" we know that you if you need to store N values then a random key must be chosen from (more than) a range of N*N - http://en.wikipedia.org/wiki/Birthday_problem. working backwards, a 64 bit integer can hold about 32 bits of data if used as a random key - that's about 4 billion values. but that's for a probability of 50% collisions. you want a much lower probability, so a practical limit is around 10 million entries.

so, in simple terms, if you have a 64 bit key, a random approach would work for around 10 million entries a sequential approach for many more. in either case, that is probably more than you need.

if you have a 32 bit key then a random approach works for around a thousand values (a sequential approach goes to about 4 billion, as above).

obviously if you have a text value then you need to modify this accordingly, but UUIDs are designed to have "enough" values anyway http://en.wikipedia.org/wiki/Universally_unique_identifier

typically a database will provide a sequential ID and that is all you need. if not, the 64 bit random approach is usually simplest and worth the extra space.

139

answered Sep 27 '22 18:09

andrew cooke

Related questions
                            
                                How to make Spring Boot application resistant against database connection errors?
                            
                                logstach: jdbc_page_size doesn't dump all my data to elastic search
                            
                                Spring Boot Unable to determine jdbc url from datasource
                            
                                How can one make a Docker Compose service build depend on another service?
                            
                                Add Database Diagram to Source Control?
                            
                                Advantages and Disadvantages of SQLite.NET and SQL Server Compact
                            
                                Native vs ODBC database connections
                            
                                conditional statements in derby
                            
                                MongoDB: What is connection pooling and timeout?
                            
                                Can I deploy SQL Server Express with my desktop application just like builtin database?
                            
                                Smart or Not: Persist serialized data (dotnet-protobuf, protobuf-net, json) in a Relational DB in CF
                            
                                Creating an appropriate index for a frequently used query in SQL Server
                            
                                Deadlock issue in SQL Server 2008 R2 (.Net 2.0 Application)
                            
                                How to export consistent subset of database [closed]
                            
                                How to get and play WAV file stored as MySQL BLOB?
                            
                                HTML 5 Web SQL Database Transaction commit or rollback when refreshing page
                            
                                find which of a user's tweets were favorited
                            
                                UPC / SKU Database [closed]
                            
                                Why is my Firebird database so huge for the amount of data it's holding?
                            
                                Shutting down derby Network server doesnt delete db.lck

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With