Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Clojure - memoize on disk

I would like to improve the performance of a function that returns resized images. The requested size of the images should not vary a lot (device-dependant), so it would make sense to somehow cache the results.

I could of course store it on disk, and check if the resized image exists, and make sure that if the original image is deleted, the resized versions are too...

Or, I could use a memoized function. But since the result are potentially quite big (an image is about 5 - 10 MB I think), it doesn't make sense to store those in memory (several tens of GB of images and their modified versions would fill up the memory quite quickly).

So, is there a way to have a memoized function that acts like the regular Clojure defmemo, but is backed by a folder in a local disk instead of memory ? I could then use a ttlstrategy to make sure that images don't stay out of sync for too long.

Something similar to crache, but backed by a filesystem ?

like image 762
nha Avatar asked Jul 02 '15 14:07

nha


2 Answers

Don't overthink it. Your filesystem as a cache is the right idea. If one file gets popular and the file is accessed a lot then your operating system will make sure it's in RAM. This is the same strategy many databases use. For instance Elasticsearch requires you to leave enough RAM to have the Lucene index files in RAM.

Don't modify your files ever either! Do it the functional way: Treat them as immutable data. Your input file shouldn't change. If it does then it's a new file. Hard disk space is increadibly cheap. Don't be afraid of having many files laying around. If you must, you can do a garbage collection which removes old/flagged files after a while.

To see if a file is in cache you simply check if the file exists. If it isn't: You write it once.

So to summarize:

  • Let your O/S work the caching
  • Don't edit your files. Treat them as immutable data. Write once
  • Your O/S will free RAM of unused files. Hard disk space is super cheap.
like image 91
ClojureMostly Avatar answered Oct 06 '22 11:10

ClojureMostly


Why not implement a TTL-Cache from clojure.core.cache, wrapping it with the functionality you need? Your key can be whatever identifies your resized image, and the value would be its location on disk. You could then implement some kind of a get-or-set! function, passing it the function that would be invoked to generate the image when it doesn't exist. e.g.

(def Cache (atom (cache/ttl-cache-factory {} :ttl 20000)))

(defn get-or-update!
  "wraps the recommended has-hit-get pattern
   https://github.com/clojure/core.cache/wiki/Using"
  [key fn]
  (if (cache/has? @Cache key)
    (get (swap! Cache #(cache/hit % key)) key)
    (get (swap! Cache #(cache/miss % key (fn))) key)))
like image 34
Nick Avatar answered Oct 06 '22 11:10

Nick