Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Write to multiple outputs by key Scalding Hadoop, one MapReduce Job

How can you write to multiple outputs dependent on the key using Scalding(/cascading) in a single Map Reduce Job. I could of course use .filter for all the possible keys, but that is a horrible hack, which will fire up many jobs.

like image 440
samthebest Avatar asked Jun 02 '14 12:06

samthebest


1 Answers

There is TemplatedTsv in Scalding (from version 0.9.0rc16 and up), exactly same as Cascading TemplateTsv.

Tsv(args("input"), ('COUNTRY, 'GDP))
.read
.write(TemplatedTsv(args("output"), "%s", 'COUNTRY))
// it will create a directory for each country under "output" path in Hadoop mode.
like image 169
morazow Avatar answered Sep 22 '22 12:09

morazow