Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can I output a collection instead of a tuple in Scalding map method?

Tags:

scala

scalding

If you want to create a pipe with more than 22 fields from a smaller one in Scalding you are limited by Scala tuples, which cannot have more than 22 items.

Is there a way to use collections instead of tuples? I imagine something like in the following example, which sadly doesn't work:

input.read.mapTo('line -> aLotOfFields) { line: String =>
  (1 to 24).map(_.toString)
}.write(output)
like image 420
Calin-Andrei Burloiu Avatar asked Oct 25 '13 14:10

Calin-Andrei Burloiu


2 Answers

actually you can. It's in FAQ - https://github.com/twitter/scalding/wiki/Frequently-asked-questions#what-if-i-have-more-than-22-fields-in-my-data-set

val toFields = (1 to 24).map(f => Symbol("field_" + f)).toList

input
  .read
  .mapTo('line -> toFields) { line: String =>
    new Tuple((1 to 24).map(_.toString).map(_.asInstanceOf[AnyRef]): _*)

  }

the last map(_.asInstanceOf[AnyRef]) looks ugly so if you find better solution let me know please.

like image 168
Oleksii Avatar answered Oct 18 '22 17:10

Oleksii


Wrap your tuples into case classes. It will also make your code more readable and type safe than using tuples and collections respectively.

like image 22
samthebest Avatar answered Oct 18 '22 17:10

samthebest