Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

scala quirky in this while loop code

Tags:

scala

Yesterday, this piece of code caused me a headache. I fixed it by reading the file line by line. Any ideas ?

The while loop never seems to get executed even though the no of lines in the file is greater than 1.

 val lines = Source.fromFile( new File("file.txt") ).getLines;

 println( "total lines:"+lines.size );

 var starti = 1;
 while( starti < lines.size ){
   val nexti = Math.min( starti + 10, lines.size  );

   println( "batch ("+starti+", "+nexti+") total:" + lines.size )
   val linesSub = lines.slice(starti, nexti)
   //do something with linesSub
   starti = nexti
 }
like image 281
smartnut007 Avatar asked Jul 21 '11 00:07

smartnut007


People also ask

How do you do a while loop in Scala?

Syntax. The following is the syntax for do-while loop. do { statement(s); } while( condition ); Notice that the conditional expression appears at the end of the loop, so the statement(s) in the loop execute once before the condition is tested.

Does Scala have while loop?

Scala provides the different types of loops but in this article we understand while and do-while loops. While programming there might be situation which we need to repeat until and unless a condition is met. In these cases, while loop is used. A while loop generally takes a condition in parenthesis.

Why loops are discouraged in Scala?

Conclusion. As we know loops cause mutation and scala adheres to the principle of immutability, Hence recursive functions are preferred over loops in scala. So when you need loops, Use Recursion and when you need Recursion.

What is while loop example?

A "While" Loop is used to repeat a specific block of code an unknown number of times, until a condition is met. For example, if we want to ask a user for a number between 1 and 10, we don't know how many times the user may enter a larger number, so we keep asking "while the number is not between 1 and 10".


2 Answers

This is indeed tricky, and I would even say it's a bug in Iterator. getLines returns an Iterator which proceeds lazily. So what seems to happen is that if you ask for lines.size the iterator goes through the whole file to count the lines. Afterwards, it's "exhausted":

scala> val lines = io.Source.fromFile(new java.io.File("....txt")).getLines
lines: Iterator[String] = non-empty iterator

scala> lines.size
res4: Int = 15

scala> lines.size
res5: Int = 0

scala> lines.hasNext
res6: Boolean = false

You see, when you execute size twice, the result is zero.

There are two solutions, either you force the iterator into something 'stable', like lines.toSeq. Or you forget about size and do the "normal" iteration:

while(lines.hasNext) {
  val linesSub = lines.take(10)
  println("batch:" + linesSub.size)
  // do something with linesSub
}
like image 100
0__ Avatar answered Oct 06 '22 23:10

0__


None of the above answers quite hits the nail on the head.

Theres a good reason why an Iterator is returned here. By being lazy, it takes pressure off the heap, and the String representing each line can then be garbage collected as soon as you've finished with it. In the case of large files, this can make all the difference for avoiding an OutOfMemoryException.

Ideally, you'd work directly with the iterator and not force it into a strict collection type.

Using grouped then, as per om-nom-nom's answer:

for (linesSub <- lines grouped 10) {
  //do something with linesSub
}

And if you wanted to retain the println counter, zip in an index:

for ( (linesSub, batchIdx) <- (lines grouped 10).zipWithIndex ) {
  println("batch " + batchIdx)
  //do something with linesSub
}

If you really need the total, invoke getLines twice. Once for the count, and a second time to actually process the lines.

like image 28
Kevin Wright Avatar answered Oct 06 '22 23:10

Kevin Wright