Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unique constraints in couchdb

Tags:

couchdb

Are there any techniques/proposals to enforce unique constraints? Yes, we can create key that's unique, but we cannot change key and keys and also this approach is not suitable for complicated validation (separate unique login, separate unique email, etc...)

For example, an Account should have unique login and email. Deriving a key from this fields will result in inconsistency:

key1: "[email protected]", { email: "[email protected]", login: "john"}
key2: "[email protected]", { email: "[email protected]", login: "mary"}

Looking good, but:

key1: "[email protected]", { email: "[email protected]", login: "mary"}
key2: "[email protected]", { email: "[email protected]", login: "mary"}

Oops, now we have 2 accounts with login: "mary"

like image 234
Sam Avatar asked Oct 09 '09 00:10

Sam


3 Answers

Core Answer

Structure your POSTs/PUTs for the document which has the field you want to keep unique as follows:

  1. Create a view. In the map function use the field you want to enforce unique as the key. Value can be nothing. Use the reduce function to get a count for each of your keys. The best way (for performance) is to use the built in _count reduce function.

  2. Immediately after you PUT/POST a new document into the database, grab the returned id and rev and GET /yourdb/_design/yourapp/_view/viewname?group=true&key="value-of-your-unique-field-from-step-1".

  3. If the result of the last GET gives you a count value other than 1, then you just inserted a duplicate. Immediately DELETE /yourdb/id-from-step-2?rev=rev-from-step-2.

  4. Relax.


Rough Example

Lets say you are storing user accounts and you want to make sure the email address is unique, but you don't want to make it the document id (for whatever reason). Build a view to quickly check uniqueness on email address as described above. Lets call it emails. It would have a map function possibly similar to this...

function(doc) {  
  if(doc.type === 'account') {
    emit(doc.email, 1);
  }
}

And just _count as the reduce function. If you emit a 1 like above _sum will also work. Having and checking a type field on your doc as shown above is just a convention but I find it helpful sometimes. If all you are storing is user accounts, you probably don't need that.

Now lets say we are inserting a document like so ...

POST /mydb/
{
  "name": "Joe",
  "email": "[email protected]"
}

And CouchDB will respond with something like ...

{
  ok: true,
  id: 7c5c249481f311e3ad9ae13f952f2d55,
  rev: 1-442a0ec9af691a6b1690379996ccaba6
}

Check to see if we now have more than one [email protected] in the database ...

GET /mydb/_design/myapp/_view/emails/?group=true&key="[email protected]"

And CouchDB will respond with something like...

{
  rows: [
    {
      key: "[email protected]",
      value: 1
    }
  ]
}

If value is anything other than 1 in that reply, you probably just inserted a duplicate email. So delete the document and return an error along the lines of Email Address must be Unique, similar to a typical SQL database response, or whatever you want.

The delete would go something like this ...

DELETE /mydb/7c5c249481f311e3ad9ae13f952f2d55?rev=1-442a0ec9af691a6b1690379996ccaba6

Short Discussion (if you care)

If you are coming from a good old *SQL background, this is going to seem wrong and weird. Before you flip out consider these points. Having a constraint on a field, such as uniqueness or anything else, implies a schema. CouchDB is schemaless. Coming from *SQL means you will have to change the way you think. ACID and Locks are not the only Way. CouchDB comes with a lot of flexibility and power. How you use that to get what you want is going to depend on the details of your use case and how well you can escape the confines of traditional relational database thinking.

The reason why I sometimes implement uniqueness this way is because it feels very Couchy to me. If you think about the way CouchDB works with handling update conflicts, etc, this approach follows the same flow. We store the doc we are given, we then check to see if our field is unique. If not, grab that doc we just inserted by the convenient handle we still have on it, and do something to resolve the need for uniqueness, like deleting it for example.

You may be tempted in the example above to check the uniqueness of the email address before you POST the doc. Be careful!! If there are multiple things going on, its possible that another doc with the same email address can be inserted into the database in the instant after you check if the email exists, but before you do your POST! That will leave you with a duplicate email.

There is nothing wrong with also running the check for the email address before you POST. For example, this is a good idea if your user is filling out a form and you can ajax the email field value out to the database. You can pre-warn the user that the email address exists or prevent submission, etc. However, in all cases, you should also always check for the uniqueness after you POST the doc. Then react as needed. From the perspective of the UI side, the steps above would look no different than the result obtained from a traditional *SQL database puking on a uniqueness constraint.

Can anything go wrong? Yes. Consider this. Assume [email protected] does not already exist in the database. Two docs come in almost at the same time to be saved to the database. Doc A is POSTed, then Doc B is POSTed but before we are able to check the uniqueness for the Doc A POST. So now when we do the uniqueness check for the Doc A POST, we see that [email protected] is in the database twice. So, we will delete it and report back with the problem. But lets say before we can delete Doc A, the uniqueness check for Doc B also happens, getting the same count value for [email protected]. Now both POSTs will be rejected, even though [email protected] was originally actually not in the database! In other words, if two docs with a matching value come into your app at almost the same time, its possible that they could see each others POSTs and mistakenly conclude that the value they carry is already in the database! We can't really prevent this because there is no traditional RDB style locking in CouchDB. But in exchange for that small price, we get master-master replication and a ton of other cool stuff. I'll take it! And if this is a huge problem for your implementation, you can try to address it by implementing a retry mechanism of some sort, etc.

Finally, CouchDB really is a gem of a database, but don't just take this and expect it to be a bulletproof approach. A lot is really going to depend on the details of your specific application.

like image 113
Raffi M. Avatar answered Nov 06 '22 00:11

Raffi M.


This is one of the less fun bits of CouchDB. The best way I've found for handling unique fields that can change (like in your user example) is to create "pointer" documents with the unique values as a component of the key, and then use those to let you claim unique values. The important part of doing this is having a predictable key for the primary document, then saving the unique field claim documents prior to saving the primary (and letting key conflicts prevent the primary document from saving).

Given a user with a unique username and unique email, your primary documents might look like this:

user-1234: { username: "kurt", email: "kurt@localhost" }
user-9876: { username: "petunia", email: "[email protected]" }

The unique field pointers would look something like this:

user-username-kurt: { primary_doc: "user-1234" }
user-email-kurt@localhost: { primary_doc: "user-1234" }
user-username-petunia: { primary_doc: "user-9876" }
[email protected]: { primary_doc: "user-9876" }

Creating or updating a user would take these steps:

  1. Prep your user document, generate a key for it if necessary
  2. Save a "pointer" document for each changed unique field
  3. If saving any of those fails, stop and fix errors
  4. Save primary user document

Step 3 will take some thought. For instance, you won't want to try claim unique values for fields that haven't changed. You could, but then you'd have to put some additional logic in to handle a case where you're claiming a value for a user that already owns that value.

Step 3 would be a good place to let people take old claimed values as well. If one user has "released" the username kurt, for instance, I can just update that particular document to point to my new key after verifying that it's no longer in use. The alternative is to clear out claimed unique values when they change. I'm not sure which would be less work, really. Leaving stale claimed values in makes the most sense to me.

The interesting thing about this solution is that you don't need to use those pointer documents for anything once they're created. You can create views as normal on your user docs, and use them to query by email or username.

This setup also allows you to do relationships without having to worry about cascading user keys. I don't know about you, but my user documents are referenced by just about every other document in the system. Changing a user key would be a massive pain in the ass.

like image 17
MrKurt Avatar answered Nov 06 '22 00:11

MrKurt


It depends. Consider the multi-master replicated case, there could be conflicting entries added there consistent within each master, but not consistent once they replicate. You might only be using one couchdb server, but in general they design it assuming a multimaster case, and don't put in any features that would only work correctly in single unreplicated server.

If you only care about the single server case the conceivably you could rebuild your couchjs with networking support and perform an http query in your validate_doc_update() function that would perform a query against the DB to see if the email address already is already used and fail the update if so. Check here for more details about the validation mechanism. I don't recommend doing that, instead I would embed all the uniqueness in the id field (either directly or via hashing) and just deal with moving the doc if the user changed anything that effected that.

like image 9
Louis Gerbarg Avatar answered Nov 05 '22 22:11

Louis Gerbarg