I am doing writing code for PhD research and starting to use Scala. I often have to do text processing. I am used to Python, whose 'yield' statement is extremely useful for implementing complex iterators over large, often irregularly structured text files. Similar constructs exist in other languages (e.g. C#), for good reason.
Yes I know there have been previous threads on this. But they look like hacked-up (or at least badly explained) solutions that don't clearly work well and often have unclear limitations. I would like to write code something like this:
import generator._ def yield_values(file:String) = { generate { for (x <- Source.fromFile(file).getLines()) { # Scala is already using the 'yield' keyword. give("something") for (field <- ":".r.split(x)) { if (field contains "/") { for (subfield <- "/".r.split(field)) { give(subfield) } } else { // Scala has no 'continue'. IMO that should be considered // a bug in Scala. // Preferred: if (field.startsWith("#")) continue // Actual: Need to indent all following code if (!field.startsWith("#")) { val some_calculation = { ... do some more stuff here ... } if (some_calculation && field.startsWith("r")) { give("r") give(field.slice(1)) } else { // Typically there will be a good deal more code here to handle different cases give(field) } } } } } } }
I'd like to see the code that implements generate() and give(). BTW give() should be named yield() but Scala has taken that keyword already.
I gather that, for reasons I don't understand, Scala continuations may not work inside a for statement. If so, generate() should supply an equivalent function that works as close as possible to a for statement, because iterator code with yield almost inevitably sits inside a for loop.
Please, I would prefer not to get any of the following answers:
yield keyword will returns a result after completing of loop iterations. The for loop used buffer internally to store iterated result and when finishing all iterations it yields the ultimate result from that buffer. It doesn't work like imperative loop.
Yield is a keyword in scala that is used at the end of the loop. We can perform any operation on the collection elements by using this for instance if we want to increment the value of collection by one. This will return us to the new collection.
Yield is used in sequence comprehensions. It is applied in combination with for-loop and writes a new element into the resulting sequence. Scala's for-loop comprehensions are syntactic for the composition of multiple operations with map, flatMap, and filter. Or foreach.
What the yield keyword does is as follows: Each time you iterate, Python runs the code until it encounters a yield statement inside the function. Then, it sends the yielded value and pauses the function in that state without exiting.
Scala | yield Keyword Last Updated : 07 Jun, 2019 yield keyword will returns a result after completing of loop iterations. The for loop used buffer internally to store iterated result and when finishing all iterations it yields the ultimate result from that buffer.
A for comprehension is translated to calls to map, flatMap and filter. The answer starts like this: "It is used in sequence comprehensions (like Python's list-comprehensions and generators, where you may use yield too)." This mistakenly leads one to think that yield in Scala is similar to yield in Python. This is not the case.
As another example, here’s what the Scala modulus operator does in a for/yield loop: I mentioned in my description that the for loop yield construct returns a collection that is the same as the collection it is given. To demonstrate this, let’s look at the same examples with a Scala Array.
Here are some examples of the Scala 3 for loop syntax, including for/do and for/yield expressions. // single line for i <- ints do println (i) for (i <- ints) println (i) // multiline for i <- ints if i > 2 do println (i) These examples show how to use a for loop as the body of a Scala method:
The premise of your question seems to be that you want exactly Python's yield, and you don't want any other reasonable suggestions to do the same thing in a different way in Scala. If this is true, and it is that important to you, why not use Python? It's quite a nice language. Unless your Ph.D. is in computer science and using Scala is an important part of your dissertation, if you're already familiar with Python and really like some of its features and design choices, why not use it instead?
Anyway, if you actually want to learn how to solve your problem in Scala, it turns out that for the code you have, delimited continuations are overkill. All you need are flatMapped iterators.
Here's how you do it.
// You want to write for (x <- xs) { /* complex yield in here */ } // Instead you write xs.iterator.flatMap { /* Produce iterators in here */ } // You want to write yield(a) yield(b) // Instead you write Iterator(a,b) // You want to write yield(a) /* complex set of yields in here */ // Instead you write Iterator(a) ++ /* produce complex iterator here */
That's it! All your cases can be reduced to one of these three.
In your case, your example would look something like
Source.fromFile(file).getLines().flatMap(x => Iterator("something") ++ ":".r.split(x).iterator.flatMap(field => if (field contains "/") "/".r.split(field).iterator else { if (!field.startsWith("#")) { /* vals, whatever */ if (some_calculation && field.startsWith("r")) Iterator("r",field.slice(1)) else Iterator(field) } else Iterator.empty } ) )
P.S. Scala does have continue; it's done like so (implemented by throwing stackless (light-weight) exceptions):
import scala.util.control.Breaks._ for (blah) { breakable { ... break ... } }
but that won't get you what you want because Scala doesn't have the yield you want.
'yield' sucks, continuations are better
Actually, Python's yield
is a continuation.
What is a continuation? A continuation is saving the present point of execution with all its state, such that one can continue at that point later. That's precisely what Python's yield
, and, also, precisely how it is implemented.
It is my understanding that Python's continuations are not delimited, however. I don't know much about that -- I might be wrong, in fact. Nor do I know what the implications of that may be.
Scala's continuation do not work at run-time -- in fact, there's a continuations library for Java that work by doing stuff to bytecode at run-time, which is free of the constrains that Scala's continuation have.
Scala's continuation are entirely done at compile time, which require quite a bit of work. It also requires that the code that will be "continued" be prepared by the compiler to do so.
And that's why for-comprehensions do not work. A statement like this:
for { x <- xs } proc(x)
If translated into
xs.foreach(x => proc(x))
Where foreach
is a method on xs
's class. Unfortunately, xs
class has been long compiled, so it cannot be modified into supporting the continuation. As a side note, that's also why Scala doesn't have continue
.
Aside from that, yes, this is a duplicate question, and, yes, you should find a different way to write your code.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With