I would like to store up-to 10M files, 2TB storage unit. The only properties which I need restricted to filenames, and their contents (data).
The files max-length is 100MB, most of them are less than 1MB. The ability of removing files is required, and both writing and reading speeds should be a priority - while low storage efficiency, recovery or integrity methods, are not needed.
I thought about NTFS, but most of its features are not needed, while can't be disabled and considered to be an overhead concern, a few of them are: creation date, modification date, attribs, journal and of course permissions.
Due to the native features of a filesystem which are not needed, would you suggest I'll use SQLITE for this requirement? or there's an obvious disadvantage that I should be aware about? (one would guess that removing files will be a complicated task?)
(SQLITE will be via the C api)
My goal is to use a more suited solution to gain performance. Thanks in advance - Doori Bar
General Findings. SQLite is competitive with, and usually faster than, blobs stored in separate files on disk, for both reading and writing. SQLite is much faster than direct writes to disk on Windows when anti-virus protection is turned on.
SQLite supports databases up to 281 terabytes in size, assuming you can find a disk drive and filesystem that will support 281-terabyte files. Even so, when the size of the content looks like it might creep into the terabyte range, it would be good to consider a centralized client/server database.
It reduces application cost because content can be accessed and updated using concise SQL queries instead of lengthy and error-prone procedural queries. SQLite can be easily extended in in future releases just by adding new tables and/or columns. It also preserve the backwards compatibility.
So for very simple operations, the filesystem is definitely faster. Filesystems will probably beat an RDBMS for raw read throughput too since there is less overhead. In fact, if you think about it, the database can never be faster than the filesystem it sits on in terms of raw throughput.
When storing files of ~ 10 KiB sqlite is approximately 35% faster. SQLite reads and writes small blobs (for example, thumbnail images) 35% faster¹ than the same blobs can be read from or written to individual files on disk using fread () or fwrite ().
If your main requirement is performance, go with native file system. DBMS are not well suited for handling large BLOBs, so SQLite is not an option for you at all (don't even know why everybody considers SQLite to be a plug for every hole).
Some other SQL database engines advise developers to store blobs in separate files and then store the filename in the database. In that case, where the database must first be consulted to find the filename before opening and reading the file, simply storing the entire blob in the database gives much faster read and write performance with SQLite.
Since anti-virus software is and should be on by default in Windows, that means that SQLite is generally much faster than direct disk writes on Windows. Reading is about an order of magnitude faster than writing, for all systems and for both SQLite and direct-to-disk I/O. I/O performance varies widely depending on operating system and hardware.
The official SQLite site actually includes a page which documents the performance benefits of using a database over a native filesystem in various operating systems. When storing files of ~ 10 KiB sqlite is approximately 35% faster.
SQLite reads and writes small blobs (for example, thumbnail images) 35% faster¹ than the same blobs can be read from or written to individual files on disk using fread() or fwrite().
Furthermore, a single SQLite database holding 10-kilobyte blobs uses about 20% less disk space than storing the blobs in individual files.
The performance difference arises (we believe) because when working from an SQLite database, the open() and close() system calls are invoked only once, whereas open() and close() are invoked once for each blob when using blobs stored in individual files. It appears that the overhead of calling open() and close() is greater than the overhead of using the database. The size reduction arises from the fact that individual files are padded out to the next multiple of the filesystem block size, whereas the blobs are packed more tightly into an SQLite database.
The measurements in this article were made during the week of 2017-06-05 using a version of SQLite in between 3.19.2 and 3.20.0. You may expect future versions of SQLite to perform even better.
You may experience different results when using larger files, and SQLite site includes a link to kvtest which you may use to reproduce these results on your own hardware / operating system.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With