Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

(How can/What should) I implement a database that scales to the upper tens of thousands requests/second?

By Upper tens of thousands requests/second I want to see 60,000 -> +90,000 requests/second.

My Setup consists of the following:

user ---> web app --> message queue --> parser --> database?

I should mention that the parser currently can parse/stuff around 18750 records/second using COPY so we are limited on that end until we start adding more parsers -- this isn't a huge concern for me now.

I have a system that requires the ability to bulk upload as fast as I can as many records as I can. This same system (or it can be different depending on how you would approach it) should be able to respond to analytical type queries such as this:

wonq = "select sum(amount) from actions where player = '@player' and " +
       "(type = 'award' or type = 'return') and hand = hand_num"
lostq = "select sum(amount) from actions where player = 'player' and " +
        "type != 'award' and type != 'return' and hand = hand_num"

.....10-15 thousand times (PER USER) since they are keyed off to another table. Needless to say we paginate these results at 10/page for now.

I've looked at the following: (assuming these are all on the same server)

  • mysql (reg. run of the mill rdbms) -- was able to get into the 15-20 thousand requests/second range; under current conditions if we try to scale this out we need a seperate host/database everytime we need to scale -- this is not doable

  • couchdb (document oriented db) -- didn't break 700 requests/second; I was really hoping this was going to save our ass -- not a chance!

  • vertica (columnar oriented db) -- was hitting 60000 request/second, closed source, very pricey; this is still an option but I personally did not like it at all

  • tokyocabinet (hash based db) -- is currently weighing in at 45,000 inserts/second and 66,000 selects/second; yesterday when I wrote this I was using a FFI based adapater that was performing at around 5555 requests/second; this is by-far THE fastest most awesome database I've seen yet!!

  • terracotta -- (vm cluster) currently evaluating this along with jmaglev (can't wait until maglev itself comes out) -- this is THE SLOWEST!

maybe I'm just approaching this problem wrong but I've ALWAYS heard that RDBMS were slow as all hell -- so where are these super fast systems that I've heard about?

Testing Conditions::

Just so ppl know my specs on my dev box are:

dual 3.2ghz intel, 1 gig ram

Mysql mysql.cnf edits were:

key_buffer = 400M               # was 16M
innodb_log_file_size = 100M     # non existent before
innodb_buffer_pool_size = 200M  # non existent before

UPDATE::

It turns out that terracotta might have a place in our application structure but it flat out WILL NOT be replacing our database anytime soon as it's speeds are terrible and it's heap utilization sucks.

On the other hand, I was very happy to see that tokyocabinet's NON-FFI ruby library (meaning tyrant/cabinet) is super fast and right now that is first place.

like image 443
eyberg Avatar asked Feb 17 '09 23:02

eyberg


2 Answers

For crazy-big scalability, you'll want to focus on two things:

  • Sharding: Split your data set into groups that don't overlap. Have an easy, fast way to map from a request to a server. (Player starting with a-f, server 1; g-q, server 2... etc...)
  • Caching: Use Memcache to remember the output of some really common select queries, so you don't have to go to disk as often.
like image 82
ojrac Avatar answered Sep 21 '22 00:09

ojrac


Well the big player in the game is Oracle but thats big bucks.

If you want to go cheap then you will have to pay the price in a different terms:

  • by partioning the DB across multiple instances and distributing the load.
  • Potentially caching results so actual DB access is reduced.
like image 23
Martin York Avatar answered Sep 19 '22 00:09

Martin York