I am new to Scala, and have seen many ways to define a function but could not find a clear explanation on the differences, and when to use which form.
What are the main differences between the following function definitions?
With '='
def func1(node: scala.xml.Node) = {
print(node.label + " = " + node.text + ",")
}
Without '='
def func2 (node: scala.xml.Node) {
print(node.label + " = " + node.text + ",")
}
With '=>'
def func3 = (node: scala.xml.Node) => {
print(node.label + " = " + node.text + ",")
}
As a var
var func4 = (node: scala.xml.Node) => {
print(node.label + " = " + node.text + ",")
}
Without a block
def func5 (node: scala.xml.Node) = print(node.label + " = " + node.text + ",")
They all seem to compile and render the same result when used as a callback for
xmlNodes.iterator.foreach(...)
Each of these questions has been answered elsewhere on this site, but I don't think anything handles them all together. So:
Methods defined with an equals sign return a value (whatever the last thing evaluates to). Methods defined with only braces return Unit
. If you use an equals but the last thing evalutes to Unit
, there is no difference. If it's a single statement after an equals sign, braces are not required; this makes no difference to bytecode. So 1., 2., and 5. are all essentially identical:
def f1(s: String) = { println(s) } // println returns `Unit`
def f2(s: String) { println(s) } // `Unit` return again
def f5(s: String) = println(s) // Don't need braces; there's only one statement
A function, often written A => B
, is a subclass of one of the Function
classes, e.g. Function1[A,B]
. Because this class has an apply
method, which Scala magically calls when you just use parens without a method name, it looks like a method call--and it is, except it's a call on that Function
object! So if you write
def f3 = (s: String) => println(s)
then what you are saying is "f3
should create an instance of Function1[String,Unit]
which has an apply
method that looks like def apply(s: String) = println(s)
". So if you say f3("Hi")
, this is first calls f3
to create the function object, and then calls the apply
method.
It's rather wasteful to create the function object every single time you want to use it, so it makes more sense to store the function object in a var:
val f4 = (s: String) => println(s)
This holds one instance of the same function object that the def
(method) would return, so you don't have to recreate it each time.
People differ on the convention of : Unit = ...
and { }
. Personally, I write all methods that return Unit
without an equals sign--this is an indication to me that the method is almost surely useless unless it has some sort of side-effect (mutates a variable, performs IO, etc.). Also, I generally only use braces when required either because there are multiple statements or because the single statement is so complex I want a visual aid to tell me where it ends.
Methods should be used whenever you want, well, a method. Function objects should be created any time you want to pass them into some other method to use them (or should be specified as parameters any time you want to be able to apply a function). For example, suppose you want to be able to scale a value:
class Scalable(d: Double) {
def scale(/* What goes here? */) = ...
}
You could supply a constant multiplier. Or you could supply something to add and something to multiply. But most flexibly, you'd just ask for an arbitrary function from Double
to Double
:
def scale(f: Double => Double) = f(d)
Now, maybe you have an idea of a default scale. That's probably no scaling at all. So you might want a function that takes a Double
and returns the very same Double
.
val unscaled = (d: Double) => d
We store the function in a val
because we don't want to keep creating it over and over again. Now we can use this function as a default argument:
class Scalable(d: Double) {
val unscaled = (d: Double) => d
def scale(f: Double => Double = unscaled) = f(d)
}
Now we can call both x.scale
and x.scale(_*2)
and x.scale(math.sqrt)
and they'll all work.
Yes, there are differences in bytecode. And yes, there are guidelines.
With =
: This declares a method which accepts a parameter and returns the last expression in the right hand side block, which has the type Unit
here.
Without =
: This declares a method which does not have a return value, that is, the return type is always Unit
, irrespective of what the type of the last expression in the right hand side block is.
With =>
: This declares a method which returns a function object of type scala.xml.Node => Unit
. Every time you invoke this method func3
, you will construct a new function object on the heap. If you write func3(node)
, you will first invoke func3
which returns the function object and then invoke the apply(node)
on that function object. This is slower than just calling a plain method directly as in cases 1. and 2.
As a var
: This declares a variable and creates a function object as in 3., but the function object is created only once. Using this to call the function object is in most cases slower than just a plain method call (may not be inlined by JIT), but at least you do not recreate the object. If you want to avoid the danger of someone reassigning the variable func4
, use a val
or a lazy val
instead.
This is syntactic sugar for 1. when blocks contain only a single expression.
Note that if you use the forms 1., 2. and 5. with the higher-order foreach
method, Scala will still create a function object which calls func1
, func2
or func5
implicitly, and pass that to foreach
(it will not use a method handle or smth like that, at least not in current versions).
In these cases, the generated code will roughly correspond to:
xmlNodes.iterator.foreach((node: scala.xml.Node) => funcX(node))
So, the guideline is - unless you are using the same function object every time, just create an ordinary method as in 1., 2. or 5. It will be lifted to a function object anyway, where this is needed.
If you realize that this generates a lot of objects because calling such method happens often, you might want to micro-optimize by using the form 4. instead to ensure that the function object for foreach
gets created only once.
Where deciding between 1., 2. and 5. is concerned, one guideline is - if you have a single statement, use form 5.
Otherwise, if the return type is Unit
, then use the def foo(): Unit = {
form if this is public API, so that clients looking at your code quickly and clearly see the return type.
Use the def foo() {
form for methods with return type Unit
which are private, for your own convenience of shorter code. But this is just one particular guideline regarding style.
For more, see: http://docs.scala-lang.org/style/declarations.html#methods
Well, 1, 2, and 5 aren't functions at all, they are methods, which are fundamentally different from functions: methods belong to objects and are not themselves objects, whereas functions are objects.
1, 2, and 5 are also exactly the same: if you have only one statement, then you don't need curly braces to group several statements, ergo 5 is the same as 1. Leaving off the =
sign is syntactic sugar for declaring a return type of Unit
, but Unit
is also the inferred return type for 1 and 5, so 2 is the same as 1 and 5.
3 is a method which, when called, returns a function. 4 is a variable which points to a function.
1-2. When you throw away equals sign, your function becomes procedure (returns Unit, or just nothing).
3. In third case you defined a function scala.xml.Node => Unit
, that returns a function.
4. Same, but you've assigned some function scala.xml.Node => Unit
to variable. The difference explained in Differences between these three ways of defining a function in Scala
5. No difference, comparing with 1. But you can't write multiline statements like that.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With