I understand the usual "Task not serializable" issue that arises when accessing a field or a method that is out of scope of a closure.
To fix it, I usually define a local copy of these fields/methods, which avoids the need to serialize the whole class:
class MyClass(val myField: Any) {
def run() = {
val f = sc.textFile("hdfs://xxx.xxx.xxx.xxx/file.csv")
val myField = this.myField
println(f.map( _ + myField ).count)
}
}
Now, if I define a nested function in the run method, it cannot be serialized:
class MyClass() {
def run() = {
val f = sc.textFile("hdfs://xxx.xxx.xxx.xxx/file.csv")
def mapFn(line: String) = line.split(";")
val myField = this.myField
println(f.map( mapFn( _ ) ).count)
}
}
I don't understand since I thought "mapFn" would be in scope... Even stranger, if I define mapFn to be a val instead of a def, then it works:
class MyClass() {
def run() = {
val f = sc.textFile("hdfs://xxx.xxx.xxx.xxx/file.csv")
val mapFn = (line: String) => line.split(";")
println(f.map( mapFn( _ ) ).count)
}
}
Is this related to the way Scala represents nested functions?
What's the recommended way to deal with this issue ? Avoid nested functions?
Isn't it working in the way so that in the first case f.map(mapFN(_))
is equivalent to f.map(new Function() { override def apply(...) = mapFN(...) })
and in the second one it is just f.map(mapFN)
? When you declare a method with def
it is probably just a method in some anonymous class with implicit $outer
reference to the enclosing class. But map
requires a Function
so the compiler needs to wrap it. In the wrapper you just refer to some method of that anonymous class, but not to the instance itself. If you use val
, you have a direct reference to the function which you pass to the map
. I'm not sure about this, just thinking out loud...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With