This might sound like an obvious question, but I'm new to CouchDB, so I thought it was worthwhile asking in case there is something about CouchDB's structure that changes the situation that I didn't know about. For reasons out of my control, I have to build a queue-like structure out of CouchDB. For simplicity's sake, let's say I'm queueing IDs for jobs to be executed later. Note that there will be no duplicates.
I'm trying to figure out what the best way to structure this is. As I currently see it, I have a few options:
queue database with the IDs as _id, and store the dequeued items in a similar dequeued database with the IDs as the _id. Each record in each database wouldn't have any other information other than the (mandatory) _id and _rev._id = 'queue' and one record with _id = 'dequeued'. Within each of the two records, there will be an arbitrary number of keys, each of which will be an ID for the jobs to be executed (or that were already executed). The values associated in the database with the keys will be irrelevant, possibly just a Boolean.queue. Within that record, have two keys: queue and dequeued. Each of those keys will have as its associated value an arbitrary-length list of job execution IDs.1 is slightly less desirable because it requires two databases, and 2 strikes me as a poor choice because it requires loading the entire list of queued or dequeued items in order to read a list item or make any changes. However, 3 is nice in that it allows for the whole list of IDs to be an ordered list rather than key/value pairs, which makes it easier to pick a random item from the list to be the next job to be executed, since I don't actually need to know any key names (since there are none).
I'm looking for whichever provides the best performance. Any thoughts on this?
For people reading this question in the future, I've built my CouchDB queuing module, CouchQueue, a work in progress.
You can get it npm install couchqueue.
Take a look (and please comment, pull request, etc.) here at Github.
Use one document per element in the queue, and keep one queue database.
I recommend a field to order the elements, for example .created_at with a timestamp in ISO 8601 format.
You can toggle an element's visibility with a .visible flag.
I recommend a map/reduce view, something like this
function(doc) {
if(doc.visible)
emit(doc.created_at, doc)
}
Now you can query this view, either oldest-first, or newest-first (?descending=true). You can mark an element complete by updating it, setting visible = false.
I wrote a CouchDB queue, CQS which is identical to the Amazon SQS API. It is similar to what I describe, except there is a checked-out state messages can be, not visible in the queue for a timeout period. I have used CQS in production for about two years, with hundreds of millions of updates.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With