Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Database design for write-heavy web application

A lot of the LOB applications we provide to our customers are of a marketing / promotional nature (sweepstakes, event registration, etc...). Most of the applications, while very simple, are very demanding on the database. Imagine a "registration" type site as the backing for a commercial that airs during the superbowl, for example (yes, we've had several).

Though we have gotten very good at optimizing our web app code, the database always remains an issue, despite the application being relatively simple. The flow is typically something like:

  1. Read from database to detect existing record
  2. Write to database if record is new

In many cases, this is all the data access our application needs to perform. However, given that it is the sole purpose of the application, it's quite important that this simple process be optimized greatly.

For the purposes of this question, we have a single server running a raid 5 disk array for the data files and another raid 5 array for the logs. At this time, the OS is Windows 2003 standard 32bit and the server has 4 GB of memory. Some apps use SQL 2005 standard while others use MySQL 5.1. I'm very well aware that certain OS and hardware optimizations are possible here, but I'm looking to address my needs from a software side first. Extensive profiling has taught us that disk IO is generally the main bottleneck.

Having said all that, and knowing that caching won't help much since most reads are unique and return very little data (often only a bit indicating whether a record exists or not), I'm considering making a leap into the realm of in-memory databases as sort of a write-cache layer to the real database. This seems like a good fit given that most of our high volume traffic is sporadic in nature, and not sustained over several hours. Additionally, the potential loss of a few minutes of data due to a server crash would be acceptable in most cases.

In the simplest form, I would modify a typical registration app to do the following:

  1. Query the disk DB and memory DB for existing records
  2. If none, write data to the memory DB and return
  3. Periodically flush memory DB to disk DB

My question is: what are my options for this intermediate in-memory database? I've experimented with in-memory hash tables, data tables, and such, but I'm looking for other options or even suggestions for a completely different approach.

like image 243
Chris Avatar asked Nov 04 '09 16:11

Chris


People also ask

Which type of database is used for heavy reads and writes?

NoSQL databases can flexibly support both read-heavy and write-heavy systems. With data spread out across multiple shards/servers, hashing and consistent hashing are very important techniques for determining which shard(s) to route application queries to.


1 Answers

If you don't need to know whether there is an existing record in real-time (i.e. it's important that the record gets in there, but you don't need to report whether it was new or existing to the user), you can structure your database in a way that allows for extremely fast write times without the need for an in-memory database, which carries a lot of potential problems if servers go down or worker processes restart.

Create two tables in your database for each table that are involved with this write-heavy flow. One table should be your "live" table and should be write-optimized as much as possible (i.e. no indexes and is never read from except when moving to the read table). Your other table should be your read-optimized table - indexed as appropriate for any reporting considerations, etc.

Whenever you're writing into your live table, ignore anything to do with whether a record is new or existing, or anything beyond just getting that data into the table as fast as possible and getting out of the DB. Set up a scheduled job that moves records from the live table into the read-optimized table, and worry about matching up existing records there. Ideally this would be done during non-peak times, but otherwise you might want to consider a third staging table so that there's no contention on the live table at any time.

like image 102
Ryan Brunner Avatar answered Oct 16 '22 10:10

Ryan Brunner