I'm working on a real-time application with the following characteristics:
In other words (and tl;dr), the clients can tolerate latency, but they need a lot of trustable write throughput - more throughput than "one write is one disk operation."
I'm envisioning a database that would be implemented something like this: accept a (theoretically limited by the number of file descriptors) number of TCP connections, buffer those writes in memory, log batches of them to disk as often as possible (along with the updates to the auto-incrementing index), and only respond to those TCP connections when the associated disk write operation is complete. Or it could be as simple as a lazily-writing database publishing a message that it has done a disk write (clients wait for the lazy response, then wait for the write message to report success).
I think that with such high latency tolerance, this isn't asking too much. And I'd imagine that others have had this problem, such as financial companies that can't afford to lose data, but can afford to delay the response to any one client.
Do any battle-tested database solutions like Postgres, CouchDB/Couchbase, or MongoDB support modes of operation like this?
PostgreSQL should fit this workload pretty well; pretty much everything you've specified is well within its normal feature set. Pg is ACID compliant, supports group commit to reduce sync overhead, writers don't block readers, and it uses the OS for caching so it'll naturally tend to keep only the hot data set in memory.
"Clients are willing to wait on the order of seconds for confirmation - long enough for multiple disk seeks and writes to happen"
If considering PostgreSQL your application is ideally suited to a really big commit_delay
, which will help immensely with write throughput. You can't use synchronous_commit = off
because you need confirmation of commit before reply, but you can just hold commits queued up for a few seconds to save on sync costs.
If you use Pg for a job like this you will need to tune checkpointing to make sure that checkpoints don't stall I/O. Make sure the bgwriter is aggressively writing out dirty buffers. Ensure autovaccum is running frequently - you aren't deleting from the tables, but the indexes still need maintenance, and so do the table statistics.
If you're expecting lots of data and your queries usually have a temporal element, consider partitioning the table into chunks of (say) 1 month for the 1st year, consolidating everything older than 12 months into tables partitioned by year. Pg has only limited built-in partitioning (it's hacked together using inheritance and constraint exclusion) so you have to do it by hand/script using triggers, but it does the job.
See:
commit_delay
configuration optionIf you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With