I'd like to write a simple function that iterates over the lines of a text file. I believe in 2.8
one could do:
def lines(filename: String) : Iterator[String] = {
scala.io.Source.fromFile(filename).getLines
}
and that was that, but in 2.9
the above doesn't work and instead I must do:
def lines(filename: String) : Iterator[String] = {
scala.io.Source.fromFile(new File(filename)).getLines()
}
Now, the trouble is, I want to compose the above iterators in a for
comprehension:
for ( l1 <- lines("file1.txt"); l2 <- lines("file2.txt") ){
do_stuff(l1, l2)
}
This again, used to work fine with 2.8
but causes a "too many open files"
exception to get thrown in 2.9
. This is understandable -- the second lines
in the comprehension ends up opening (and not closing) a file for each line
in the first.
In my case, I know that the "file1.txt"
is big and I don't want to suck it into
memory, but the second file is small, so I can write a different linesEager
like so:
def linesEager(filename: String): Iterator[String] =
val buf = scala.io.Source.fromFile(new File(filename))
val zs = buf.getLines().toList.toIterator
buf.close()
zs
and then turn my for-comprehension into:
for (l1 <- lines("file1.txt"); l2 <- linesEager("file2.txt")){
do_stuff(l1, l2)
}
This works, but is clearly ugly. Can someone suggest a uniform & clean
way of achieving the above. Seems like you need a way for the iterator
returned by lines
to close
the file when it reaches the end, and
this must have been happening in 2.8
which is why it worked there?
Thanks!
BTW -- here is a minimal version of the full program that shows the issue:
import java.io.PrintWriter
import java.io.File
object Fail {
def lines(filename: String) : Iterator[String] = {
val f = new File(filename)
scala.io.Source.fromFile(f).getLines()
}
def main(args: Array[String]) = {
val smallFile = args(0)
val bigFile = args(1)
println("helloworld")
for ( w1 <- lines(bigFile)
; w2 <- lines(smallFile)
)
{
if (w2 == w1){
val msg = "%s=%s\n".format(w1, w2)
println("found" + msg)
}
}
println("goodbye")
}
}
On 2.9.0
I compile with scalac WordsFail.scala
and then I get this:
rjhala@goto:$ scalac WordsFail.scala
rjhala@goto:$ scala Fail passwd words
helloworld
java.io.FileNotFoundException: passwd (Too many open files)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:120)
at scala.io.Source$.fromFile(Source.scala:91)
at scala.io.Source$.fromFile(Source.scala:76)
at Fail$.lines(WordsFail.scala:8)
at Fail$$anonfun$main$1.apply(WordsFail.scala:18)
at Fail$$anonfun$main$1.apply(WordsFail.scala:17)
at scala.collection.Iterator$class.foreach(Iterator.scala:652)
at scala.io.BufferedSource$BufferedLineIterator.foreach(BufferedSource.scala:30)
at Fail$.main(WordsFail.scala:17)
at Fail.main(WordsFail.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at scala.tools.nsc.util.ScalaClassLoader$$anonfun$run$1.apply(ScalaClassLoader.scala:78)
at scala.tools.nsc.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:24)
at scala.tools.nsc.util.ScalaClassLoader$URLClassLoader.asContext(ScalaClassLoader.scala:88)
at scala.tools.nsc.util.ScalaClassLoader$class.run(ScalaClassLoader.scala:78)
at scala.tools.nsc.util.ScalaClassLoader$URLClassLoader.run(ScalaClassLoader.scala:101)
at scala.tools.nsc.ObjectRunner$.run(ObjectRunner.scala:33)
at scala.tools.nsc.ObjectRunner$.runAndCatch(ObjectRunner.scala:40)
at scala.tools.nsc.MainGenericRunner.runTarget$1(MainGenericRunner.scala:56)
at scala.tools.nsc.MainGenericRunner.process(MainGenericRunner.scala:80)
at scala.tools.nsc.MainGenericRunner$.main(MainGenericRunner.scala:89)
at scala.tools.nsc.MainGenericRunner.main(MainGenericRunner.scala)
scala-arm provides a great mechanism for automagically closing resources when you're done with them.
import resource._
import scala.io.Source
for (file1 <- managed(Source.fromFile("file1.txt"));
l1 <- file1.getLines();
file2 <- managed(Source.fromFile("file2.txt"));
l2 <- file2.getLines()) {
do_stuff(l1, l2)
}
But unless you're counting on the contents of file2.txt
to change while you're looping through file1.txt
, it would be best to read that into a List
before you loop. There's no need to convert it into an Iterator
.
Maybe you should take a look at scala-arm (https://github.com/jsuereth/scala-arm) and let the closing of the files (file input streams) happen automatically in the background.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With