Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

BerkeleyDB Concurrency

  • What's the optimal level of concurrency that the C++ implementation of BerkeleyDB can reasonably support?
  • How many threads can I have hammering away at the DB before throughput starts to suffer because of resource contention?

I've read the manual and know how to set the number of locks, lockers, database page size, etc. but I'd just like some advice from someone who has real-world experience with BDB concurrency.

My application is pretty simple, I'll be doing gets and puts of records that are about 1KB each. No cursors, no deleting.

like image 696
Ted Dziuba Avatar asked Aug 01 '08 23:08

Ted Dziuba


4 Answers

It depends on what kind of application you are building. Create a representative test scenario, and start hammering away. Then you will know the definitive answer.

Besides your use case, it also depends on CPU, memory, front-side bus, operating system, cache settings, etcetera.

Seriously, just test your own scenario.

If you need some numbers (that actually may mean nothing in your scenario):

like image 123
Daan Avatar answered Nov 08 '22 02:11

Daan


I strongly agree with Daan's point: create a test program, and make sure the way in which it accesses data mimics as closely as possible the patterns you expect your application to have. This is extremely important with BDB because different access patterns yield very different throughput.

Other than that, these are general factors I found to be of major impact on throughput:

  1. Access method (which in your case i guess is BTREE).

  2. Level of persistency with which you configured DBD (for example, in my case the 'DB_TXN_WRITE_NOSYNC' environment flag improved write performance by an order of magnitude, but it compromises persistency)

  3. Does the working set fit in cache?

  4. Number of Reads Vs. Writes.

  5. How spread out your access is (remember that BTREE has a page level locking - so accessing different pages with different threads is a big advantage).

  6. Access pattern - meanig how likely are threads to lock one another, or even deadlock, and what is your deadlock resolution policy (this one may be a killer).

  7. Hardware (disk & memory for cache).

This amounts to the following point: Scaling a solution based on DBD so that it offers greater concurrency has two key ways of going about it; either minimize the number of locks in your design or add more hardware.

like image 43
yoav.aviram Avatar answered Nov 08 '22 00:11

yoav.aviram


Doesn't this depend on the hardware as well as number of threads and stuff?

I would make a simple test and run it with increasing amounts of threads hammering and see what seems best.

like image 34
svrist Avatar answered Nov 08 '22 02:11

svrist


What I did when working against a database of unknown performance was to measure turnaround time on my queries. I kept upping the thread count until turn-around time dropped, and dropping the thread count until turn-around time improved (well, it was processes in my environment, but whatever).

There were moving averages and all sorts of metrics involved, but the take-away lesson was: just adapt to how things are working at the moment. You never know when the DBAs will improve performance or hardware will be upgraded, or perhaps another process will come along to load down the system while you're running. So adapt.

Oh, and another thing: avoid process switches if you can - batch things up.


Oh, I should make this clear: this all happened at run time, not during development.

like image 3
Josh Avatar answered Nov 08 '22 01:11

Josh