I need to have a very, very long list of pairs (X, Y) in Scala. So big it will not fit in memory (but fits nicely on a disk).
So, this is basically a "disk-persisted-lazy-cacheable-List" ™
Any ideas on how to get one before I start to roll out my own?
Addendum: yes.. mongodb, or any other non-embeddable resource, is an overkill. If you are interested in a specific use-case for this, see the class Timeline
here. Basically, I which to have a very, very big timeline (millions of pairs throughout months), although my matches only need to touch the last hours.
The easiest way to do something like this is to extend Traversable
. You only have to define foreach
, and you have full control over the traversal, so you can do things like open and close the file.
You can also extend Iterable
, which requires defining iterator
and, of course, returning some sort of Iterator
. In this case, you'd probably create an Iterator
for the disk data, but it's going to be much harder to control things like open files.
Here's one example of a Traversable
such as I described, written by Josh Suereth:
class FileLinesTraversable(file: java.io.File) extends Traversable[String] {
override def foreach[U](f: String => U): Unit = {
val in = new java.io.BufferedReader(new java.io.FileReader(file))
try {
def loop(): Unit = in.readLine match {
case null => ()
case line => f(line); loop()
}
loop()
} finally {
in.close()
}
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With