Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to write Map/Reduce tasks in Golang?

Tags:

go

hadoop

I would like to write Hadoop Map/Reduce jobs in Go (and not the Streaming API!) .

I tried to get a grasp of hortonworks/gohadoop and colinmarc/hdfs but I still don't see how to write jobs for real. I have searched on github codes importing these modules but there is nothing relevant apparently.

Is there any WordCount.go somewhere?

like image 223
frigo americain Avatar asked Aug 05 '15 12:08

frigo americain


People also ask

How do you write a MapReduce program?

MapReduce program executes in three stages, namely map stage, shuffle stage, and reduce stage. Map stage − The map or mapper's job is to process the input data. Generally the input data is in the form of file or directory and is stored in the Hadoop file system (HDFS).

What is map task and reduce?

Every job consists of two key components: mapping task and reducing task. The map task plays the role of splitting jobs into job-parts and mapping intermediate data. The reduce task plays the role of shuffling and reducing intermediate data into smaller units. The job tracker acts as a master.

What is the task of the synchronization in MapReduce?

In MapReduce, synchronization is accomplished by a barrier between the map and reduce phases of processing. Intermediate key-value pairs must be grouped by key, which is accomplished by a large distributed sort involving all the nodes that executed map tasks and all the nodes that will execute reduce tasks.

How is reduce task performed in MapReduce?

The reduce job takes the output from a map as input and combines those data tuples into a smaller set of tuples. As the sequence of the name MapReduce implies, the reduce job is always performed after the map job.

How to work with maps in Golang?

Working with maps in GoLang We can insert, delete, retrieve keys in a map. Let’s see how to do that. 1. Inserting elements in a map You can insert keys in two ways. Either insert keys when initializing or use index syntax to initialize.

What is the best way to implement Map Reduce in go?

Glow is aiming to be a simple and scalable map reduce system, all in pure Go. Not only the system setup is simple and scalable, but also writing and running the map reduce code. Glow also provides Map()/Filter()/Reduce() functions, which works well in standalone mode. It’s totally fine to just run in standalone mode.

How to create and initialize maps in Go language?

In Go language, maps can create and initialize using two different ways: Creating Map: You can simply create a map using the given syntax: In maps, the zero value of the map is nil and a nil map doesn’t contain any key. If you try to add a key-value pair in the nil map, then the compiler will throw runtime error.

How to declare a map in go?

Now we will see how to declare a map in Go. package main import ( "fmt" ) func main() { var names map[int]string // name map has int keys and string values } In the above example, the key is of type int while the values are of string type. Initializing a Map Let’s see how we can initialize a map with values. 1. Using make() function


1 Answers

This github: https://github.com/vistarmedia/gossamr is a good example for starting to use a golang job on Hadoop:

Jist:

package main

import (
  "log"
  "strings"

  "github.com/vistarmedia/gossamr"
)

type WordCount struct{}

func (wc *WordCount) Map(p int64, line string, c gossamr.Collector) error {
  for _, word := range strings.Fields(line) {
    c.Collect(strings.ToLower(word), int64(1))
  }
  return nil
}

func (wc *WordCount) Reduce(word string, counts chan int64, c gossamr.Collector) error {
  var sum int64
  for v := range counts {
    sum += v
  }
  c.Collect(sum, word)
  return nil
}

func main() {
  wordcount := gossamr.NewTask(&WordCount{})

  err := gossamr.Run(wordcount)
  if err != nil {
    log.Fatal(err)
  }
}

Kicking off the script:

./bin/hadoop jar ./contrib/streaming/hadoop-streaming-1.2.1.jar \
  -input /mytext.txt \
  -output /output.15 \
  -mapper "gossamr -task 0 -phase map" \
  -reducer "gossamr -task 0 -phase reduce" \
  -io typedbytes \
  -file ./wordcount
  -numReduceTasks 6
like image 132
Simon Kesteloot Avatar answered Oct 16 '22 04:10

Simon Kesteloot