I know this type of question has been asked before, but I could not find one that compared the options I have in mind. So I am going to post them here, please post links if there are duplicates.
This has ended up a rather long post, if you have time please read it through as the question is at the end
EDIT2: I've accepted an answer as I think that will be the best solution for now. But I thought I would like to two other questions that answer my query about concatenating numbers. They can be found here: Combine two integers to create a unique number & Concatenate integers in C#. If I was going to try encoding the number (as below like 51122222) I think this would be useful. Though maybe just using something like String.Format in c# would be fast enough for my small application.
I'm currently trying to find a way to setup distributed applications that use the same database schema and can synchronise with maybe one master database that all others also sync with.
The program I am planning currently will start as a fairly simple program to track information. The first version might contain two tables: Items and ItemHistory. This is an example of possible fields:
Items
ItemID(PK) ?
Name String
Content String
ItemHistory
ItemHistoryID (PK) ?
ItemID (FK) ?
EventName String
CreatedOn DateTime
I've listed the field name and type, this question is about what to use for the PK types so they are missing.
The first version will be a standard desktop app, I'm currently planning on using C# with a WPF front end and SQLite for the database. Eventually I also want to create a version to run on my Android phone as well. This is where the distributed part comes in. I don't always have a signal so will need the app to run offline and allow synchronisation when online again.
Here are the ideas I have so far on how to deal with the ID's:
While option 1 would work and I have used it in the past I want to look at the possibility of other options to avoid the issues with UUIDs. I would like to have a solution that is easier to read while debugging and is sortable.
Option 2 would work but it does force a limit on the number of records. I know in my small application it will almost never go over that many but I would like to try and see if there's a solution that does not require such a limit. Option 3 avoids the limit by using alternating numbers, but I think you would need to know how many database are to be used or you might fill all numbers otherwise. Using a start of 1 and increment of 1 on DB1 and start of 2 and increment of 2 on DB2 would use every number alternatively. You could use 50 as the increment but then you just have another limit but now its on the number of applications that can use it. Again I know its a limit that is not going to be hit in my situation but could be an issue in an application that suddenly becomes very popular.
Option 4 seems like it could solve the issue for me, but I'm not sure if it would work in practice or not. One idea I had was to allow a prefix to be set on each application then that could be used with an auto incrementing value. e.g. PC1, PC2 for records on a pc and maybe PHONE1, PHONE2 etc for records from the Android. This would work but using numbers in strings causes the sorting issue with 1, 11, 100 showing up next to each other, that is in less leading zeros are used and then its back to a limited number of records again.
I have wondered if it would be possible to use a number for the DB ID and the auto increment. e.g PC = 1 and PHONE = 2. then we have 11, 12, 13 etc for the PC with maybe 111 for the 11th record and 2304 for the 304th record on PHONE. But I don't know how this would be done or if it can easily be done and not cause excess overheads for generating values.
At work they have used a similar numbering system, they use something like this 51122222. 5 would refer to the instance of the application, then its a 2 digit year and finally a auto incrementing number. I've not got a clear answer yet what happens if we go over 99999 records in a year. I think they might have figured that its not going to happen and are happy they have calculated the risk.
So finally a question, is there a way to create a primary key system for a distributed application that allows for sorting and does not enforce limits (besides the size of the data type itself e.g. max integer)?
EDIT: Here's a little bit more info on the app I plan to write. I want to create something that will let me store just about any type of information that I might gain, the system will include the ability to tag the entries so I can search on a topic. Types of information I see so far could be recommendations on books, dvds, websites etc. Or maybe local tips for the place I'm living. One overall idea is to stop keeping these bits of information spread across multiple computers/laptops/phones in different formats.
UUID always occupies 16 bytes. For Auto Increment Integer, when stored as Long format, it occupies 8 bytes. If the table itself has only a few columns, the extra primary key space overhead will become more significant.
Using UUID for a primary key brings the following advantages: UUID values are unique across tables, databases, and even servers that allow you to merge rows from different databases or distribute databases across servers. UUID values do not expose the information about your data so they are safer to use in a URL.
The advantages to using numeric, auto incremented primary keys are numerous, but the most impactful benefits are faster speed when performing queries and data-independence when searching through thousands of records which might contain frequently altered data elsewhere in the table.
Note that you should always use the PostgreSQL data type uuid for UUID s. Don't try to convert them to strings or numeric — you will waste space and lose performance.
in broad terms, there are two approaches.
you use sequential values. these may be divided up into groups, interleaved, whatever. they are the most efficient approach, but require collaboration and coordination.
you use random values (this includes UIDs). these are much simpler but require more space. from "birthday collisions" we know that you if you need to store N values then a random key must be chosen from (more than) a range of N*N - http://en.wikipedia.org/wiki/Birthday_problem. working backwards, a 64 bit integer can hold about 32 bits of data if used as a random key - that's about 4 billion values. but that's for a probability of 50% collisions. you want a much lower probability, so a practical limit is around 10 million entries.
so, in simple terms, if you have a 64 bit key, a random approach would work for around 10 million entries a sequential approach for many more. in either case, that is probably more than you need.
if you have a 32 bit key then a random approach works for around a thousand values (a sequential approach goes to about 4 billion, as above).
obviously if you have a text value then you need to modify this accordingly, but UUIDs are designed to have "enough" values anyway http://en.wikipedia.org/wiki/Universally_unique_identifier
typically a database will provide a sequential ID and that is all you need. if not, the 64 bit random approach is usually simplest and worth the extra space.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With