Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is it recommended practice to store images on disk rather than in a Realm

I am using Realm as the database solution for my app. I need persistent storage ability for my images so I can load them when offline. I also need a cache so I can load the images from there rather than fetching them from the API each time a cell draws them. My first thought was that a Realm database could serve both of these functions just fine if I were to store the images in Realm as NSData. But I have found two answers on SE (here and here) that recommend not doing this if you have many images of a largish size that will change often. Instead they recommend saving the images to disk, and then storing the URL to those images in Realm.

My question is why is this best practice? The answers linked to above don't give reasons why except to say that you end up with a bloated database. But why is that a problem? What is the difference between a having lots of images in my database vs having lots of images on disk?

Is it a speed issue? If so, is there a marked speed difference in an app being able to access an image from disk to being able to access it from a database solution like Realm?

Thanks in advance.

like image 464
thecloud_of_unknowing Avatar asked May 25 '16 16:05

thecloud_of_unknowing


1 Answers

This isn't really just a problem localised to Realm. I remember the same advice being given with Core Data too.

I'm guessing the main reason above all else as to why storing large binary data in a database isn't recommended is because 'You don't gain anything, and actually stand to lose more than you otherwise would'.

With Core Data (i.e. databases backed by SQLite), you'll actually take a performance hit as the data will be copied into memory when you perform the read from SQLite. If it's a large amount of data, then this is wholly unacceptable.

With Realm at least, since it uses a zero-copy, memory-mapped mechanism, you'll be provided with the NSData mapped straight from the Realm file, but then again, this is absolutely no different than if you simply loaded the image file from disk itself.

Where this becomes a major problem in Realm is when you start changing the image often. Realm actually uses an internal snapshotting mechanism when working with changing data across threads, but that essentially means that during operation, entire sets of data might be periodically duplicated on-disk (To ensure thread-safety). If the data sets include large blobs of binary data, these will get duplicated too (Which might also mean a performance hit as well). When this happens, the size of the Realm file on disk will be increased to accomodate the snapshots, but when the operation completes and the snapshots are deleted, the file will not shrink back to it's original size. This is because reclaiming that disk space would be a costly performance hit, and since it's easily possible the space could be needed again (i.e. by another large snapshotting operation), it seems inefficient to pre-emptively do (hence the 'bloat').

It's possible to manually perform an operation to reclaim this disk space if necessary, but the generally recommended approach is to optimise your code to minimise this from happening in the first place.

So, to sum that all up, while you totally can save large data blobs to a database, over time, it'll potentially result in performance hits and file size bloat that you could have otherwise avoided. These sorts of databases are designed to help transform small bits of data to a format that can be saved to and retrieved from disk, so it's essentially wasted on binary files that could easily be directly saved without any modification.

It's usually much easier, cleaner and more efficient to simply store your large binary data on disk, and simply store a file name reference to them inside the database. :)

like image 73
TiM Avatar answered Oct 06 '22 17:10

TiM