I am looking for a fast (as in huge performance, not quick fix) solution for persisting and retrieving tens of millions of small (around 1k) binary objects. Each object should have a unique ID for retrieval (preferably, a GUID or SHA). Additional requirements is that it should be usable from .NET and it shouldn't require additional software installation. Currently, I am using an SQLite database with a single table for this job, but I want to get rid of the overhead of processing simple SQL instructions like SELECT data FROM store WHERE id = id. I've also tested direct filesystem persistency under NTFS, but the performance degrades very fast as soon as it reaches half a millions objects. P.S. By the way, objects never need to be deleted, and the insertion rate is very, very low. In fact, every time an object changes a new version is stored and the previous version remains. This is actually a requirement to support time-traveling. Just adding some additional information to this thread: To BLOB or Not To BLOB: Large Object Storage in a Database or a Filesystem http://arxiv.org/abs/cs.DB/0701168

You need call a prepare function only once per statement, with parameter denoted e.g. by <code>?</code> (so <code>SELECT data FROM store WHERE id=?</code> is the statement you'd prepare); then what you do "millions of times" is just to bind the parameter into the prepared statement and call <code>sqlite_step</code> -- these are fast operations. Worth benchmarking if blob open might not be even faster. IOW, I recommend sticking with SQLite and digging into its low-level interface (from managed C++ if you must) for maximum performance -- it's really an amazing little engine, and it has often surprised me favorably with its performance!

Fastest way to retrieve/store millions of small binary objects

Tags:

performance

.net

database

sqlite

data-structures

I am looking for a fast (as in huge performance, not quick fix) solution for persisting and retrieving tens of millions of small (around 1k) binary objects. Each object should have a unique ID for retrieval (preferably, a GUID or SHA). Additional requirements is that it should be usable from .NET and it shouldn't require additional software installation.

Currently, I am using an SQLite database with a single table for this job, but I want to get rid of the overhead of processing simple SQL instructions like SELECT data FROM store WHERE id = id.

I've also tested direct filesystem persistency under NTFS, but the performance degrades very fast as soon as it reaches half a millions objects.

P.S. By the way, objects never need to be deleted, and the insertion rate is very, very low. In fact, every time an object changes a new version is stored and the previous version remains. This is actually a requirement to support time-traveling.

Just adding some additional information to this thread:

To BLOB or Not To BLOB: Large Object Storage in a Database or a Filesystem http://arxiv.org/abs/cs.DB/0701168

819

asked Jul 18 '09 17:07

Hugo Sereno Ferreira

2 Answers

You may be able to lessen the performance problems of NTFS by breaking the object's GUID identifier up into pieces and using them as directory names. That way, each directory only contains a limited number of subdirectories or files.

e.g. if the identifier is aaaa-bb-cc-ddddeeee, the path to the item would be c:\store\aaaa\bbcc\dddd\eeee.dat, limiting each directory to no more than 64k subitems.

answered Oct 05 '22 22:10

Daniel Earwicker

You need call a prepare function only once per statement, with parameter denoted e.g. by ? (so SELECT data FROM store WHERE id=? is the statement you'd prepare); then what you do "millions of times" is just to bind the parameter into the prepared statement and call sqlite_step -- these are fast operations. Worth benchmarking if blob open might not be even faster. IOW, I recommend sticking with SQLite and digging into its low-level interface (from managed C++ if you must) for maximum performance -- it's really an amazing little engine, and it has often surprised me favorably with its performance!

answered Oct 05 '22 22:10

Alex Martelli

Related questions
                            
                                Upgrade from .NET 3.0 to 3.5: Sites set to StateServer revert to InProc when in Web Garden
                            
                                Using Ribbon as tab control
                            
                                What is the opposite of Type.MakeByRefType
                            
                                Detect cycles in a genealogy graph during a Depth-first search
                            
                                How to start creating an application API in .NET
                            
                                Using a DSL to generate C# Code
                            
                                .Net ORM that works well with MySQL [closed]
                            
                                VBPROJ / CSPROJ
                            
                                Why does my Static method hide my instance method?
                            
                                C# - Adding Button inside ListBox
                            
                                ASP.NET MVC, ActionFilters, static classes and passing data around
                            
                                Assign multiple values to a parameter in Crystal Reports
                            
                                CodeSmith v.s. T4: .netTiers level suite
                            
                                How to disable ellipsis of cell texts in a WindowsForms DataGridView?
                            
                                Gridview wordwrap
                            
                                What are the limitations of Dynamic Language Runtime in .NET 4.0?
                            
                                Consuming Python COM Server from .NET
                            
                                Bidirectional Map in .NET
                            
                                Label control behaves differently at design time vs. run time
                            
                                what's the best way to implement chained events in C#

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With