How to write Map/Reduce tasks in Golang?

Tags:

hadoop

I would like to write Hadoop Map/Reduce jobs in Go (and not the Streaming API!) .

I tried to get a grasp of hortonworks/gohadoop and colinmarc/hdfs but I still don't see how to write jobs for real. I have searched on github codes importing these modules but there is nothing relevant apparently.

Is there any WordCount.go somewhere?

223

asked Aug 05 '15 12:08

frigo americain

1 Answers

This github: https://github.com/vistarmedia/gossamr is a good example for starting to use a golang job on Hadoop:

Jist:

package main

import (
  "log"
  "strings"

  "github.com/vistarmedia/gossamr"
)

type WordCount struct{}

func (wc *WordCount) Map(p int64, line string, c gossamr.Collector) error {
  for _, word := range strings.Fields(line) {
    c.Collect(strings.ToLower(word), int64(1))
  }
  return nil
}

func (wc *WordCount) Reduce(word string, counts chan int64, c gossamr.Collector) error {
  var sum int64
  for v := range counts {
    sum += v
  }
  c.Collect(sum, word)
  return nil
}

func main() {
  wordcount := gossamr.NewTask(&WordCount{})

  err := gossamr.Run(wordcount)
  if err != nil {
    log.Fatal(err)
  }
}

Kicking off the script:

./bin/hadoop jar ./contrib/streaming/hadoop-streaming-1.2.1.jar \
  -input /mytext.txt \
  -output /output.15 \
  -mapper "gossamr -task 0 -phase map" \
  -reducer "gossamr -task 0 -phase reduce" \
  -io typedbytes \
  -file ./wordcount
  -numReduceTasks 6

132

answered Oct 16 '22 04:10

Simon Kesteloot

Related questions
                            
                                How to get nth row of Spark RDD?
                            
                                How to generate Date Series in HIVE? (Creating table)
                            
                                AM Container is running beyond virtual memory limits
                            
                                Increase number of Hive mappers in Hadoop 2
                            
                                Why is RAID not recommended for Hadoop HDFS setups?
                            
                                Delete hdfs folder from java
                            
                                Hadoop: No Such Method Exception
                            
                                What runs first: the partitioner or the combiner?
                            
                                JAVA _Home is not set in Hadoop
                            
                                How to subtract one day from current date then convert to string in Hive
                            
                                How to alter Hive partition column name
                            
                                spark-submit continues to hang after job completion
                            
                                Sorted word count using Hadoop MapReduce
                            
                                GlusterFS as the backend for Hadoop
                            
                                Build custom join logic in Cascading ensuring MAP_SIDE only
                            
                                Write Dataframe to Phoenix
                            
                                Sqoop - Import Job failed
                            
                                Use multiple Guava versions in same maven project
                            
                                MapReduce Output ArrayWritable

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With