Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the best primary key strategy for an online/offline multi-client mobile application with SQLite and Azure SQL database as the central store?

What primary key strategy would be best to use for a relational database model given the following?

  • tens of thousands of users
  • multiple clients per user (phone, tablet, desktop)
  • millions of rows per table (continually growing)

Azure SQL will be the central data store which will be exposed via Web API. The clients will include a web application and a number of native apps including iOS, Android, Mac, Windows 8, etc. The web application will require an “always on” connection and will not have a local data store but will instead retrieve and update via the api - think CRUD via RESTful API.

All other clients (phone, tablet, desktop) will have a local db (SQLite). On first use of this type of client the user must authenticate and sync. Once authenticated and synced, these clients can operate in an offline mode (creating, deleting and updating records in the local SQLite db). These changes will eventually sync with the Azure backend.

The distributed nature of the databases leaves us with a primary key problem and the reason for asking this question.

Here is what we have considered thus far:

GUID

Each client creates it’s own keys. On sync, there is a very small chance for a duplicate key but we would need to account for it by writing functionality into each client to update all relationships with a new key. GUIDs are big and when multiple foreign keys per table are considered, storage may become an issue over time. Likely the biggest problem is the random nature of GUIDs which means that they can not (or should not) be used as the clustered index due to fragmentation. This means we would need to create a clustered index (perhaps arbitrary) for each table.

Identity

Each client creates it’s own primary keys. On sync, these keys are replaced with server generated keys. This adds additional complexity to the syncing process and forces each client to “fix” their keys including all foreign keys on related tables.

Composite

Each client is assigned a client id on first sync. This client id is used in conjunction with a local auto-incrementing id as a composite primary key for each table. This composite key will be unique so there should be no conflicts on sync but it does mean that most tables will require a composite primary key. Performance and query complexity is the concern here.

HiLo (Merged Composite)

Like the composite approach, each client is assigned a client id (int32) on the first sync The client id is merged with a unique local id (int32) into a single column to make an application wide unique id (int64). This should result in no conflicts during sync. While there is more order to these keys vs GUIDs since the ids generated by each client are sequential, there will be thousands of unique client-ids, so do we still run the risk of fragmentation on our clustered index?

Are we overlooking something? Are there any other approaches worth investigating? A discussion of the pros and cons of each approach would be quite helpful.

like image 873
user1843640 Avatar asked Apr 22 '13 18:04

user1843640


1 Answers

I've considered this question at length came to the decision that a GUID is usually the best solution. Here's a little information on why:

Identity

The Identity option sounds like it removes all the negatives, but having implemented a Single Page Web App that implemented this system, I can tell you it adds a significant amount of complexity to the code. A temporary id can spread through your client side data quite quickly, and it's really hard to create a system that has no holes in it when it comes to finding every single possible usage. It usually leads to application and data specific hard-coded information to track foreign keys on the client (which is tedious and error prone as the database changes and you forget to update this information). It also adds a lot of overhead to every sync, as it might have to run through multiple tables each sync to check for temporary ids. There might be a better way to implement this system, but I haven't seen a good approach that doesn't add a ton of complexity and possible ugly error states in your data.

Composite

The composite approaches also add a lot of complexity to your code in generating session ids and creating ids from them, and they don't really offer any advantages over GUIDs other than you can guarantee that it's unique - but the thing is, a GUID is theoretically unique, and while I was scared of the fact that there is a possibility of repeats, I realized that it was an infinitesimally small chance and there's actually a really easy method to handle the small possibility that it's not unique.

GUIDs

My biggest worries about using a GUID were

  1. they have a large size and aren't traditional ints, which will make transferring large bits of data slower and degrade database performance
  2. if you actually ever do run into a conflict, it can ruin your app, so you have to write complex code to handle a situation you will probably never use.

Then I realized that in an offline style web app, you're not usually transferring large amounts of data at once because it's all stored on the client.

You also don't worry about server database performance much either because that's done behind the scenes in a sync - you just worry about client side data performance.

Last, I realized that handling a conflict is really a trivial thing. Just test for a conflict and if you get one, create a new GUID on the server and continue with the operation. Then send a message back to the client that causes the client to throw up a little error message and then deletes all client side data and re-downloads it fresh from the server. This is really quick and easy to implement, and you probably already want this as a possible operation on an offline web app anyway. While it might sound inconvenient for the user, the likelihood of the user ever seeing this error is almost 0%.

Conclusion

In the end, I think for this type of app, GUID's are the easiest to implement and work the best with the least possibility for error and without creating overly complex code.

If your application doesn't have to run offline, but you have a client-side database for performance or other reasons, you can also consider throwing up a loading gif and pausing client side execution until the id is returned via ajax from the server.

like image 154
dallin Avatar answered Nov 16 '22 01:11

dallin