I'm designing a Java based web-app and I need a key-value store. Berkeley DB seems fitting enough for me, but there appears to be TWO Berkeley DBs to choose from: Berkeley DB Core which is implemented in C, and Berkeley DB Java Edition which is implemented in pure Java.
The question is, how to choose which one to use? With web-apps scalability and performance is quite important (who knows, maybe my idea will become the next Youtube), and I couldn't find easily any meaningful benchmarks between the two. I have yet to familiarize with Cores Java API, but I find it hard to believe that it could be much worse than Java Editions, which seems to be quite nice.
If some other key-value store would be much better, feel free to recommend that too. I'm storing smallish binary blobs, and keys probably will be hashes of the data, or some other unique id.
The our open source license permits you to use Berkeley DB, Berkeley DB Java Edition or Berkeley DB XML at no charge under the condition that if you use the software in an application you redistribute, the complete source code for your application must be available and freely redistributable under reasonable conditions ...
Berkeley DB is an Open Source embedded database library that provides scalable, high-performance, transaction-protected data management services to applications. Berkeley DB provides a simple function-call API for data access and management.
I have quite a bit of experience using both BDB-JE and BDB-core with Java. Deciding which one to use is quite simple: If you want concurrency, use BDB-JE. If you want scalability, use BDB-core.
BDB-JE breaks down performance-wise with large databases due to its file format and its reliance on Java garbage collection to clean up evicted cache entries. Expect long garbage collection pauses or spend a lot of time tuning magic GC settings. The file format has issues too, because the background cleaner threads have to spend a lot of time cleaning up garbage created by early cache evictions. If your database fits in RAM, BDB-JE works quite well.
BDB-core relies on a page-locking strategy, and highly concurrent applications experience a lot of deadlocks. If you can randomly order operations it reduces the deadlock potential, but it never eliminates it. Because BDB-core stores data in a more traditional way, it scales to super large sizes with predictable and expected performance degradation. Because its cache is not managed by a garbage collector, it can be quite large and not cause any pauses.
If you derive a common interface to these, and have a suitable set of unit tests, you should be able to swap between the two trivially at a later date (perhaps when you really need to make a decision based on hard facts that are not available right now)
I faced the same problem and decided to go with the Java edition, mainly because of its portability(I need something that would ran even on mobile devices). There are also the Direct Persistence Layer (DPL) API and the fact that the whole db is a single jar makes its deployment fairly simple.
The recent version 4 brought in High availability and performance improvements. There is also the fact that long running java applications can achieve such an optimization, that they would surpass native C applications performance in some scenarios.
It's a natural fit for any Java application - desktop or web.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With