If you want to create a pipe with more than 22 fields from a smaller one in Scalding you are limited by Scala tuples, which cannot have more than 22 items.
Is there a way to use collections instead of tuples? I imagine something like in the following example, which sadly doesn't work:
input.read.mapTo('line -> aLotOfFields) { line: String =>
(1 to 24).map(_.toString)
}.write(output)
actually you can. It's in FAQ - https://github.com/twitter/scalding/wiki/Frequently-asked-questions#what-if-i-have-more-than-22-fields-in-my-data-set
val toFields = (1 to 24).map(f => Symbol("field_" + f)).toList
input
.read
.mapTo('line -> toFields) { line: String =>
new Tuple((1 to 24).map(_.toString).map(_.asInstanceOf[AnyRef]): _*)
}
the last map(_.asInstanceOf[AnyRef]) looks ugly so if you find better solution let me know please.
Wrap your tuples into case classes. It will also make your code more readable and type safe than using tuples and collections respectively.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With