Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Disk-persisted-lazy-cacheable-List ™ in Scala

I need to have a very, very long list of pairs (X, Y) in Scala. So big it will not fit in memory (but fits nicely on a disk).

  • All update operations are cons (head appends).
  • All read accesses start in the head, and orderly traverses the list until it finds a pre-determined pair.
  • A cache would be great, since most read accesses will keep the same data over and over.

So, this is basically a "disk-persisted-lazy-cacheable-List" ™

Any ideas on how to get one before I start to roll out my own?


Addendum: yes.. mongodb, or any other non-embeddable resource, is an overkill. If you are interested in a specific use-case for this, see the class Timeline here. Basically, I which to have a very, very big timeline (millions of pairs throughout months), although my matches only need to touch the last hours.

like image 589
Hugo Sereno Ferreira Avatar asked Jan 29 '12 23:01

Hugo Sereno Ferreira


1 Answers

The easiest way to do something like this is to extend Traversable. You only have to define foreach, and you have full control over the traversal, so you can do things like open and close the file.

You can also extend Iterable, which requires defining iterator and, of course, returning some sort of Iterator. In this case, you'd probably create an Iterator for the disk data, but it's going to be much harder to control things like open files.

Here's one example of a Traversable such as I described, written by Josh Suereth:

class FileLinesTraversable(file: java.io.File) extends Traversable[String] {
  override def foreach[U](f: String => U): Unit = {
     val in = new java.io.BufferedReader(new java.io.FileReader(file))
     try {
       def loop(): Unit = in.readLine match {
          case null => ()
          case line => f(line); loop()
       }
       loop()
     } finally {
       in.close()
     }
  }
}
like image 119
Daniel C. Sobral Avatar answered Sep 17 '22 14:09

Daniel C. Sobral