I would like to write Hadoop Map/Reduce jobs in Go (and not the Streaming API!) .
I tried to get a grasp of hortonworks/gohadoop and colinmarc/hdfs but I still don't see how to write jobs for real. I have searched on github codes importing these modules but there is nothing relevant apparently.
Is there any WordCount.go
somewhere?
MapReduce program executes in three stages, namely map stage, shuffle stage, and reduce stage. Map stage − The map or mapper's job is to process the input data. Generally the input data is in the form of file or directory and is stored in the Hadoop file system (HDFS).
Every job consists of two key components: mapping task and reducing task. The map task plays the role of splitting jobs into job-parts and mapping intermediate data. The reduce task plays the role of shuffling and reducing intermediate data into smaller units. The job tracker acts as a master.
In MapReduce, synchronization is accomplished by a barrier between the map and reduce phases of processing. Intermediate key-value pairs must be grouped by key, which is accomplished by a large distributed sort involving all the nodes that executed map tasks and all the nodes that will execute reduce tasks.
The reduce job takes the output from a map as input and combines those data tuples into a smaller set of tuples. As the sequence of the name MapReduce implies, the reduce job is always performed after the map job.
Working with maps in GoLang We can insert, delete, retrieve keys in a map. Let’s see how to do that. 1. Inserting elements in a map You can insert keys in two ways. Either insert keys when initializing or use index syntax to initialize.
Glow is aiming to be a simple and scalable map reduce system, all in pure Go. Not only the system setup is simple and scalable, but also writing and running the map reduce code. Glow also provides Map()/Filter()/Reduce() functions, which works well in standalone mode. It’s totally fine to just run in standalone mode.
In Go language, maps can create and initialize using two different ways: Creating Map: You can simply create a map using the given syntax: In maps, the zero value of the map is nil and a nil map doesn’t contain any key. If you try to add a key-value pair in the nil map, then the compiler will throw runtime error.
Now we will see how to declare a map in Go. package main import ( "fmt" ) func main() { var names map[int]string // name map has int keys and string values } In the above example, the key is of type int while the values are of string type. Initializing a Map Let’s see how we can initialize a map with values. 1. Using make() function
This github: https://github.com/vistarmedia/gossamr is a good example for starting to use a golang job on Hadoop:
Jist:
package main
import (
"log"
"strings"
"github.com/vistarmedia/gossamr"
)
type WordCount struct{}
func (wc *WordCount) Map(p int64, line string, c gossamr.Collector) error {
for _, word := range strings.Fields(line) {
c.Collect(strings.ToLower(word), int64(1))
}
return nil
}
func (wc *WordCount) Reduce(word string, counts chan int64, c gossamr.Collector) error {
var sum int64
for v := range counts {
sum += v
}
c.Collect(sum, word)
return nil
}
func main() {
wordcount := gossamr.NewTask(&WordCount{})
err := gossamr.Run(wordcount)
if err != nil {
log.Fatal(err)
}
}
Kicking off the script:
./bin/hadoop jar ./contrib/streaming/hadoop-streaming-1.2.1.jar \
-input /mytext.txt \
-output /output.15 \
-mapper "gossamr -task 0 -phase map" \
-reducer "gossamr -task 0 -phase reduce" \
-io typedbytes \
-file ./wordcount
-numReduceTasks 6
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With