Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is best practice when creating document IDs in couchdb? [closed]

We all know that for relational databases it is best practice to use numerical IDs for the primary key.

In couchdb the default ID that is generated is a UUID. Is it best to stick with the default, or use an easily memorable identifier that will be used in the application by the user?

For example, if you were designing the stackoverflow.com database in couchdb, would you use the question slug (eg. what-is-best-practice-when-creating-document-ids-in-couchdb) or a UUID for each document?

like image 253
andyuk Avatar asked Dec 26 '09 15:12

andyuk


People also ask

How do I insert a document into CouchDB?

Open the Overview page of the database and select New Document option as shown below. When you select the New Document option, CouchDB creates a new database document, assigning it a new id. You can edit the value of the id and can assign your own value in the form of a string.

Is CouchDB document based?

CouchDB is a document storage NoSQL database. It provides the facility of storing documents with unique names, and it also provides an API called RESTful HTTP API for reading and updating (add, edit, delete) database documents. In CouchDB, documents are the primary unit of data and they also include metadata.

What is CouchDB used for?

CouchDB is an open source NoSQL database based on common standards to facilitate Web accessibility and compatibility with a variety of devices. NoSQL databases are useful for very large sets of distributed data, especially for the large amounts of non-uniform data in various formats that is characteristic of big data.


2 Answers

I'm no couchdb expert, but after having done a little research this is what I've found.

The simple answer is, use UUIDs unless you have a good reason not to.

The longer answer is, it depends on:

Cost of changing ID Vs How likely the ID is to change

Low cost of changing and likely to change ID

An example of this might be a blog with a denormalized design such as jchris' blog (sofa code available on git hub).

Every time another website links to a blog post, this is another reference to the id, so the cost of changing the id increases.

High cost of changing ID and an ID that will never change

An example of this is any DB design that is highly normalized that uses auto-increment IDs. Stackoverflow.com is a good example with its auto-incrementing question IDs that you see in every URL. The cost of changing the ID is extremely high since every foreign key would need to be updated.

How many references, or "foreign keys" (in relational DB language) will there be to the id?

Any "foreign keys" will greatly increase the cost of changing the ID. Having to update other documents is a slow operation and definitely should be avoided.

How likely is the ID to change?

If you are not wanting to use UUIDs you probably already have an idea of what ID you want to use.

If it is likely to change, the cost of changing the ID should be low. If it is not, pick a different ID.

What is your motivation for wanting to use an easily memorable ID?

Don't say performance.

Benchmarks show that "CouchDB’s view key lookups are almost, but not quite, as fast as direct document lookups". This means that having to do a search to find a record is no big deal. Don't choose friendly ids just because you can do a direct lookup on a document.

Will you be doing many bulk inserts?

If so, it is better to use incremental UUIDs for better performance.

See this post about bulk inserts. Damien Katz comments and says:

"If you want to have the fastest possible insert times, you should give the _id's ascending values, so get a UUID and increment it by 1, that way it's always inserting in the same place in the index, and being cache friendly once you are dealing with files larger than RAM. For an easier way to do the same thing, just sequentially number the documents but make it fixed length with padding so that they sort correctly, "0000001" instead of "1" for example."

like image 76
andyuk Avatar answered Sep 30 '22 22:09

andyuk


Coming from a relational database point of view, it took me a while to figure out couchdb. But the truth is the opposite of the accept answer;

Instead of using a default uuid, generating a smart id can greatly assist you in retrieving and sorting data.

Say you have a database movies. All documents can be found somewhere under the URL /movies, but where exactly?

If you store a document with the _id Jabberwocky ({"_id":"Jabberwocky"}) into your movies database, it will be available under the URL /movies/Jabberwocky. So if you send a GET request to /movies/Jabberwocky, you will get back the JSON that makes up your document ({"_id":"Jabberwocky"}).

http://guide.couchdb.org/draft/documents.html

Performance tip: if you're just using the randomly-generated doc IDs, then you're not only missing out on an opportunity to get a free index – you're also incurring the overhead of building an index you're never going to use. So use and abuse your doc IDs!

https://pouchdb.com/2014/05/01/secondary-indexes-have-landed-in-pouchdb.html

like image 34
TimoSolo Avatar answered Sep 30 '22 23:09

TimoSolo