A lot of the LOB applications we provide to our customers are of a marketing / promotional nature (sweepstakes, event registration, etc...). Most of the applications, while very simple, are very demanding on the database. Imagine a "registration" type site as the backing for a commercial that airs during the superbowl, for example (yes, we've had several).
Though we have gotten very good at optimizing our web app code, the database always remains an issue, despite the application being relatively simple. The flow is typically something like:
In many cases, this is all the data access our application needs to perform. However, given that it is the sole purpose of the application, it's quite important that this simple process be optimized greatly.
For the purposes of this question, we have a single server running a raid 5 disk array for the data files and another raid 5 array for the logs. At this time, the OS is Windows 2003 standard 32bit and the server has 4 GB of memory. Some apps use SQL 2005 standard while others use MySQL 5.1. I'm very well aware that certain OS and hardware optimizations are possible here, but I'm looking to address my needs from a software side first. Extensive profiling has taught us that disk IO is generally the main bottleneck.
Having said all that, and knowing that caching won't help much since most reads are unique and return very little data (often only a bit indicating whether a record exists or not), I'm considering making a leap into the realm of in-memory databases as sort of a write-cache layer to the real database. This seems like a good fit given that most of our high volume traffic is sporadic in nature, and not sustained over several hours. Additionally, the potential loss of a few minutes of data due to a server crash would be acceptable in most cases.
In the simplest form, I would modify a typical registration app to do the following:
My question is: what are my options for this intermediate in-memory database? I've experimented with in-memory hash tables, data tables, and such, but I'm looking for other options or even suggestions for a completely different approach.
NoSQL databases can flexibly support both read-heavy and write-heavy systems. With data spread out across multiple shards/servers, hashing and consistent hashing are very important techniques for determining which shard(s) to route application queries to.
If you don't need to know whether there is an existing record in real-time (i.e. it's important that the record gets in there, but you don't need to report whether it was new or existing to the user), you can structure your database in a way that allows for extremely fast write times without the need for an in-memory database, which carries a lot of potential problems if servers go down or worker processes restart.
Create two tables in your database for each table that are involved with this write-heavy flow. One table should be your "live" table and should be write-optimized as much as possible (i.e. no indexes and is never read from except when moving to the read table). Your other table should be your read-optimized table - indexed as appropriate for any reporting considerations, etc.
Whenever you're writing into your live table, ignore anything to do with whether a record is new or existing, or anything beyond just getting that data into the table as fast as possible and getting out of the DB. Set up a scheduled job that moves records from the live table into the read-optimized table, and worry about matching up existing records there. Ideally this would be done during non-peak times, but otherwise you might want to consider a third staging table so that there's no contention on the live table at any time.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With