I need to have a very, very long list of pairs (X, Y) in Scala. So big it will not fit in memory (but fits nicely on a disk).
So, this is basically a "disk-persisted-lazy-cacheable-List" ™
Any ideas on how to get one before I start to roll out my own?
Addendum: yes.. mongodb, or any other non-embeddable resource, is an overkill. If you are interested in a specific use-case for this, see the class Timeline here. Basically, I which to have a very, very big timeline (millions of pairs throughout months), although my matches only need to touch the last hours.
The easiest way to do something like this is to extend Traversable. You only have to define foreach, and you have full control over the traversal, so you can do things like open and close the file.
You can also extend Iterable, which requires defining iterator and, of course, returning some sort of Iterator. In this case, you'd probably create an Iterator for the disk data, but it's going to be much harder to control things like open files.
Here's one example of a Traversable such as I described, written by Josh Suereth:
class FileLinesTraversable(file: java.io.File) extends Traversable[String] {
  override def foreach[U](f: String => U): Unit = {
     val in = new java.io.BufferedReader(new java.io.FileReader(file))
     try {
       def loop(): Unit = in.readLine match {
          case null => ()
          case line => f(line); loop()
       }
       loop()
     } finally {
       in.close()
     }
  }
}
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With