Efficient persistent storage for simple id to table of values map for java

Tags:

I need to store some data that follows the simple pattern of mapping an "id" to a full table (with multiple rows) of several columns (i.e. some integer values [u, v, w]). The size of one of these tables would be a couple of KB. Basically what I need is to store a persistent cache of some intermediary results.

This could quite easily be implemented as simple sql, but there's a couple of problems, namely I need to compress the size of this structure on disk as much as possible. (because of amount of values I'm storing) Also, it's not transactional, I just need to write once and simply read the contents of the entire table, so a relational DB isn't actually a very good fit.

I was wondering if anyone had any good suggestions? For some reason I can't seem to come up with something decent atm. Especially something with an API in java would be nice.

575

asked Mar 12 '09 15:03

wds

1 Answers

This sounds like a job for.... new ObjectOutputStream(new FileOutputStream(STORAGE_DIR + "/" + key + ".dat"); !!

Seriously - the simplest method is to just create a file for each data table that you want to store, serialize the data into and look it up using the key as the filename when you want to read.

On a decent file system writes can be made atomic (by writing to a temp file and then renaming the file); read/write speed is measured in 10s of MBit/second; look ups can be made very efficient by creating a simple directory tree like STORAGE_DIR + "/" + key.substring(0,2) + "/" + key.substring(0,4) + "/" + key which should be still efficient with millions of entries and even more efficient if your file system uses indexed directories; lastly its trivial to implement a memory-backed LRU cache on top of this for even faster retrievals.

Regarding compression - you can use Jakarta's commons-compress to affect a gzip or even bzip2 compression to the data before you store it. But this is an optimization problem and depending on your application and available disk space you may be better off investing the CPU cycles elsewhere.

Here is a sample implementation that I made: http://geek.co.il/articles/geek-storage.zip. It uses a simple interface (which is far from being clean - its just a demonstration of the concept) that offers methods for storing and retrieving objects from a cache with a set maximum size. A cache miss is transfered to a user implementation for handling, and the cache will periodically check that it doesn't exceed the storage requirements and will remove old data.

I also included a MySQL backed implementation for completion and a benchmark to compare the disk based and MySQL based implementations. On my home machine (an old Athlon 64) the disk benchmark scores better then twice as fast as the MySQL implementation in the enclosed benchmark (9.01 seconds vs. 18.17 seconds). Even though the DB implementation can probably tweaked for slightly better performance, I believe it demonstrates the problem well enough.

Feel free to use this as you see fit.

169

answered Oct 28 '22 08:10

Guss

Related questions
                            
                                Resetting a field lazy-loaded with the double-check idiom
                            
                                Need an example of a primary-key @OneToOne mapping in Hibernate
                            
                                Making a Service Layer call from Presentation layer
                            
                                Which are the must-have-newsletters subscriptions for .net/java programmers?
                            
                                In Spring/JSP, where should formatting be performed?
                            
                                Best way to separate Business from Presentation Logic?
                            
                                Sending a value from JavaScript to JSP (with jQuery)
                            
                                How do I get symbol files for Java.exe/jvm.dll to analyse crash core dump files?
                            
                                How do I paint Swing Components to a PDF file with iText?
                            
                                How to get data out of network packet data in Java
                            
                                Preselect value in Struts <s:select> tag?
                            
                                where to store database string connection in java web app?
                            
                                Is it possible to have a MouseMotionListener listen to all system mouse motion events?
                            
                                What name do you use for the parameter in a static variable setter method?
                            
                                Override default behavior of TAB in JTextPane
                            
                                Zombie http.proxyHost settings for JVM on OSX
                            
                                Wildcards and generics error
                            
                                Java applet with init() in a package?
                            
                                How to add external pages to Java code documentation?
                            
                                Does Acegi/Spring security support getUserPrincipal()?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Efficient persistent storage for simple id to table of values map for java

Tags:

java

data-structures

persistence

wds

People also ask

1 Answers

Guss

Recent Activity

Donate For Us