I have the requirement to transform images attached to every document (actually need images to be shrinked to 400px width). What is the best way to achieve that? Was thinking on having nodejs code listening on _changes and performing necessary manipulations on document save. However, this have bunch of drawbacks: a) document change does not always means that new attachment was added b) all the time we have to process already shrinked images (at least check image width)
I think you basically have some data in a database and most of your problem is simply application logic and implementation. I could imagine a very similar requirements list for an application using Drizzle. Anyway, how can your application "cut with the grain" and use CouchDB's strengths?
A Node.js _changes
listener sounds like a very good starting point. Node.js has plenty of hype and silly debates. But for receiving a "to-do list" from CouchDB and executing that list concurrently, Node.js is ideal.
I immediately think that image metadata in the document will help you. Fetching an image and checking if it is 400px could get expensive. If you could indicate "shrunk":true
or "width":400
or something like that in the document, you would immediately know to skip the document. (This is an optimization, you could possibly skip it during the early phase of your project.)
But how do you keep the metadata in sync with the images? Maybe somebody will attach a large image later, and the metadata still says "shrunk":true
. One answer is the validation function. validate_doc_update()
has the privilege of examining both the old and the new (candidate) document version. If it is not satisfied, it can throw()
an exception to prevent the change. So it could enforce your policy in a few ways:
"shrunk"
key must also be deleted"shrunk":true
unless the user is your toolAnother idea worth investigating is, instead of setting "shrunk":true
, you set it to the MD5 checksum of the image. (That is already in the document, in the ._attachments
object.) So if your Node.js tool sees this document, it knows that it has work to do.
{ "_id": "a_doc"
, "shrunk": "md5-D2yx50i1wwF37YAtZYhy4Q=="
, "_attachments":
{ "an_image.png":
{ "content_type":"image/png"
, "revpos": 1
, "digest": "md5-55LMUZwLfzmiKDySOGNiBg=="
}
}
}
In other words:
if(doc.shrunk == doc._attachments["an_image.png"].digest)
console.log("This doc is fine")
else
console.log("Uh oh, I need to check %s and maybe shrink the image", doc._id)
I am biased because I wrote the following tools. However I have had success, and others have reported success using the Node.js package Follow to watch the _changes
events: https://github.com/iriscouch/follow
And then use Txn for ACID transactions in the CouchDB documents: https://github.com/iriscouch/txn
The pattern is,
follow()
on the _changes URL, perhaps with "include_docs":true
in the options.txn()
take care of fetching and updating, and possible retries if there is a temporary error.For example, Txn helps you atomically resize the image and also update the metadata, pretty easily.
Finally, if your program crashes, you might fetch a lot of documents that you already processed. That might be okay (if you have your metadata working); however you might want to record a checkpoint occasionally. Remember which changes you saw.
var db = "http://localhost:5984/my_db"
var checkpoint = get_the_checkpoint_somehow() // Synchronous, for simplicity
follow({"db":db, "since":checkpoint}, function(er, change) {
if(change.seq % 100 == 0)
store_the_checkpoint_somehow(change.seq) // Another synchronous call
})
Again, I am embarrassed to point to all my own tools. But image processing is a classic example of a work queue situation. Every document that needs work is placed in the queue. An unlimited, elastic, army of workers receives a job, fixes the document, and marks the job done (deleted).
I use this a lot myself, and that is why I made CQS, the CouchDB Queue System: https://github.com/iriscouch/cqs
It is for Node.js, and it is identical to Amazon SQS, except it uses your own CouchDB server. If you are already using CouchDB, then CQS might simplify your project.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With