Background
We have been working very hard to try come up with solutions for a "High Performance" application. The application is basically a high throughput in-memory manager, with a sync back to disk. The "reads" and "writes" are tremendously high, around 3000 transactions a second. We try and do as much as possible in memory, but eventually the data gets stale and needs to be flushed to disk, and this is where a huge "bottleneck" ensues. The app is multi-threaded, with about 50 threads. There is no IPC (inter-process comms)
Attempts
We initially wrote this in Java, and it worked quite well, up until a certain load, the bottleneck was hit and it just couldn't keep up. Then we tried it in C#, and the same bottle-neck was reached. We tried this with unmanaged code (C#), and though on initial tests was blindingly fast using MMF (Memory-map files), in production, reading was slow (are using Views). We did try CouchBase, but we stumbled into problems surround high network utilization. This might be poor configuration on our part!
Extra Info: In our Java attempt (non-MMF), our thread with the Queue of information that needs to get flushed to disk builds to the extent of being unable to keep up "writing" to disk. In our C# Memory-Map File Approach, the problems is that READS are very slow, and the WRITES working perfectly. For some reason, the Views are slow!
Question
So the question is, situations where you intend of transferring massive amounts of data; can someone please assist with a possible approach or architectural design that might be able to assist? I know this seems a bit broad, but I think the specific nature of high performance, high throughput should narrow down the answers.
Can anyone vouch for using Couchbase, MongoDB or Cassandra at such a level? Other ideas or solutions would be appreciated.
First off, I would like to make clear that I have little (if any) experience building high-performance, scalable applications..
Martin Fowler has a description of the LMAX architecture that allowed an application to process about 6 million orders per second on a single thread. I'm not sure it can help you (as you seemingly need to move alot of data), but maybe you can get some ideas from it: http://martinfowler.com/articles/lmax.html
The architecture is based on Event Sourcing which is often used to provide (relatively) easy scalability.
Massive amounts of data and disk access. What kind of disk are we talking about? HDDs tend to spend a lot of time moving the head around if you work with more than one file. (That shouldn't be a problem if you use SSDs, though.) Also, you should take advantage of the fact that memory-mapped files are managed in page-sized chunks. Data structures should be aligned to page boundaries, if possible.
But in any case, you must make sure you know what the bottleneck is. Optimizing data structures wouldn't help much if you actually lose the time due to thread synchronization, for example. And if you're using a HDD, page alignment might not help as much as stuffing everything into a single file somehow. So use appropriate tools to figure out which brakes are still holding you back.
Using a general-purpose database implementation might not help you as much as you hope. They are, after all, general-purpose. If performance really is that much of an issue, a special implementation with your requirements in mind might outperform these more general implementations.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With