Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the preferred way to implement 'yield' in Scala?

Tags:

I am doing writing code for PhD research and starting to use Scala. I often have to do text processing. I am used to Python, whose 'yield' statement is extremely useful for implementing complex iterators over large, often irregularly structured text files. Similar constructs exist in other languages (e.g. C#), for good reason.

Yes I know there have been previous threads on this. But they look like hacked-up (or at least badly explained) solutions that don't clearly work well and often have unclear limitations. I would like to write code something like this:

import generator._  def yield_values(file:String) = {   generate {     for (x <- Source.fromFile(file).getLines()) {       # Scala is already using the 'yield' keyword.       give("something")       for (field <- ":".r.split(x)) {         if (field contains "/") {           for (subfield <- "/".r.split(field)) { give(subfield) }         } else {           // Scala has no 'continue'.  IMO that should be considered           // a bug in Scala.           // Preferred: if (field.startsWith("#")) continue           // Actual: Need to indent all following code           if (!field.startsWith("#")) {             val some_calculation = { ... do some more stuff here ... }             if (some_calculation && field.startsWith("r")) {               give("r")               give(field.slice(1))             } else {               // Typically there will be a good deal more code here to handle different cases               give(field)             }           }         }       }     }   } } 

I'd like to see the code that implements generate() and give(). BTW give() should be named yield() but Scala has taken that keyword already.

I gather that, for reasons I don't understand, Scala continuations may not work inside a for statement. If so, generate() should supply an equivalent function that works as close as possible to a for statement, because iterator code with yield almost inevitably sits inside a for loop.

Please, I would prefer not to get any of the following answers:

  1. 'yield' sucks, continuations are better. (Yes, in general you can do more with continuations. But they are hella hard to understand, and 99% of the time an iterator is all you want or need. If Scala provides lots of powerful tools but they're too hard to use in practice, the language won't succeed.)
  2. This is a duplicate. (Please see my comments above.)
  3. You should rewrite your code using streams, continuations, recursion, etc. etc. (Please see #1. I will also add, technically you don't need for loops either. For that matter, technically you can do absolutely everything you ever need using SKI combinators.)
  4. Your function is too long. Break it up into smaller pieces and you won't need 'yield'. You'd have to do this in production code, anyway. (First, "you won't need 'yield'" is doubtful in any case. Second, this isn't production code. Third, for text processing like this, very often, breaking the function into smaller pieces -- especially when the language forces you to do this because it lacks the useful constructs -- only makes the code harder to understand.)
  5. Rewrite your code with a function passed in. (Technically, yes you can do this. But the result is no longer an iterator, and chaining iterators is much nicer than chaining functions. In general, a language should not force me to write in an unnatural style -- certainly, the Scala creators believe this in general, since they provide shitloads of syntactic sugar.)
  6. Rewrite your code in this, that, or the other way, or some other cool, awesome way I just thought of.
like image 621
Urban Vagabond Avatar asked Sep 05 '11 01:09

Urban Vagabond


People also ask

How do you use yield in Scala?

yield keyword will returns a result after completing of loop iterations. The for loop used buffer internally to store iterated result and when finishing all iterations it yields the ultimate result from that buffer. It doesn't work like imperative loop.

Why yield is used in Scala?

Yield is a keyword in scala that is used at the end of the loop. We can perform any operation on the collection elements by using this for instance if we want to increment the value of collection by one. This will return us to the new collection.

What is yield in spark?

Yield is used in sequence comprehensions. It is applied in combination with for-loop and writes a new element into the resulting sequence. Scala's for-loop comprehensions are syntactic for the composition of multiple operations with map, flatMap, and filter. Or foreach.

How does yield work in a for loop?

What the yield keyword does is as follows: Each time you iterate, Python runs the code until it encounters a yield statement inside the function. Then, it sends the yielded value and pauses the function in that state without exiting.

What is the use of yield in Scala?

Scala | yield Keyword Last Updated : 07 Jun, 2019 yield keyword will returns a result after completing of loop iterations. The for loop used buffer internally to store iterated result and when finishing all iterations it yields the ultimate result from that buffer.

Why do we use a for comprehension in Scala?

A for comprehension is translated to calls to map, flatMap and filter. The answer starts like this: "It is used in sequence comprehensions (like Python's list-comprehensions and generators, where you may use yield too)." This mistakenly leads one to think that yield in Scala is similar to yield in Python. This is not the case.

What does the Scala modulus operator do in a for loop?

As another example, here’s what the Scala modulus operator does in a for/yield loop: I mentioned in my description that the for loop yield construct returns a collection that is the same as the collection it is given. To demonstrate this, let’s look at the same examples with a Scala Array.

How do you use a for loop in Scala 3?

Here are some examples of the Scala 3 for loop syntax, including for/do and for/yield expressions. // single line for i <- ints do println (i) for (i <- ints) println (i) // multiline for i <- ints if i > 2 do println (i) These examples show how to use a for loop as the body of a Scala method:


2 Answers

The premise of your question seems to be that you want exactly Python's yield, and you don't want any other reasonable suggestions to do the same thing in a different way in Scala. If this is true, and it is that important to you, why not use Python? It's quite a nice language. Unless your Ph.D. is in computer science and using Scala is an important part of your dissertation, if you're already familiar with Python and really like some of its features and design choices, why not use it instead?

Anyway, if you actually want to learn how to solve your problem in Scala, it turns out that for the code you have, delimited continuations are overkill. All you need are flatMapped iterators.

Here's how you do it.

// You want to write for (x <- xs) { /* complex yield in here */ } // Instead you write xs.iterator.flatMap { /* Produce iterators in here */ }  // You want to write yield(a) yield(b) // Instead you write Iterator(a,b)  // You want to write yield(a) /* complex set of yields in here */ // Instead you write Iterator(a) ++ /* produce complex iterator here */ 

That's it! All your cases can be reduced to one of these three.

In your case, your example would look something like

Source.fromFile(file).getLines().flatMap(x =>   Iterator("something") ++   ":".r.split(x).iterator.flatMap(field =>     if (field contains "/") "/".r.split(field).iterator     else {       if (!field.startsWith("#")) {         /* vals, whatever */         if (some_calculation && field.startsWith("r")) Iterator("r",field.slice(1))         else Iterator(field)       }       else Iterator.empty     }   ) ) 

P.S. Scala does have continue; it's done like so (implemented by throwing stackless (light-weight) exceptions):

import scala.util.control.Breaks._ for (blah) { breakable { ... break ... } } 

but that won't get you what you want because Scala doesn't have the yield you want.

like image 128
Rex Kerr Avatar answered Nov 26 '22 09:11

Rex Kerr


'yield' sucks, continuations are better

Actually, Python's yield is a continuation.

What is a continuation? A continuation is saving the present point of execution with all its state, such that one can continue at that point later. That's precisely what Python's yield, and, also, precisely how it is implemented.

It is my understanding that Python's continuations are not delimited, however. I don't know much about that -- I might be wrong, in fact. Nor do I know what the implications of that may be.

Scala's continuation do not work at run-time -- in fact, there's a continuations library for Java that work by doing stuff to bytecode at run-time, which is free of the constrains that Scala's continuation have.

Scala's continuation are entirely done at compile time, which require quite a bit of work. It also requires that the code that will be "continued" be prepared by the compiler to do so.

And that's why for-comprehensions do not work. A statement like this:

for { x <- xs } proc(x) 

If translated into

xs.foreach(x => proc(x)) 

Where foreach is a method on xs's class. Unfortunately, xs class has been long compiled, so it cannot be modified into supporting the continuation. As a side note, that's also why Scala doesn't have continue.

Aside from that, yes, this is a duplicate question, and, yes, you should find a different way to write your code.

like image 21
Daniel C. Sobral Avatar answered Nov 26 '22 07:11

Daniel C. Sobral