Custom Map Reduce Program on Hive, what's the Rule? How about input and output?

1 Answers

There are basically 2 ways to add custom mappers/reducers to hive queries.

using transform

SELECT TRANSFORM(stuff1, stuff2) FROM table1 USING 'script' AS thing1, thing2

where stuff1, stuff2 are the fields in table1 and script is any executable which accepts the format i describe later. thing1, thing2 are the outputs from script

using map and reduce

FROM (
    FROM table
    MAP table.f1 table.f2
    USING 'map_script'
    AS mp1, mp2
    CLUSTER BY mp1) map_output
  INSERT OVERWRITE TABLE someothertable
    REDUCE map_output.mp1, map_output.mp2
    USING 'reduce_script'
    AS reducef1, reducef2;

This is slightly more complicated but gives more control. There are 2 parts to this. In the first part the mapper script will receive data from table and map it to fields mp1 and mp2. these are then passed on to reduce_script, this script will receive sorted output on the key, which we have specified in CLUSTER BY mp1. mind you, more than one key will be handled by one reducer. The output of the reduce script will go to table someothertable

Now all these scripts follow a simple pattern. they will read line by line from stdin. The fields will be \t separated and they will write back to stdout, in the same manner ( fields separated by '\t' )

Check out this blog, there are some nice examples.

http://dev.bizo.com/2009/07/custom-map-scripts-and-hive.html

http://dev.bizo.com/2009/10/reduce-scripts-in-hive.html

answered Oct 05 '22 19:10

Rohan Monga

Related questions
                            
                                What is Hue all about?
                            
                                How to mount HDFS on Ubuntu 14.04
                            
                                exporting Hive table to csv in hdfs
                            
                                Read ORC files directly from Spark shell
                            
                                Spark submit to yarn as a another user
                            
                                Should hadoop clusters run on identical hardware?
                            
                                hadoop vs teradata what is the difference
                            
                                Hadoop on cassandra database
                            
                                How to implement sort in hadoop?
                            
                                ClassNotFoundException: org.apache.hive.jdbc.HiveDriver
                            
                                Hadoop combiner sort phase
                            
                                Hadoop Documentation for Eclipse
                            
                                Where is the classpath set for hadoop
                            
                                Error in Hadoop MapReduce
                            
                                Most efficient way to create a path in zookeeper where root elements of the path may or may not exist?
                            
                                How to process a range of hbase rows using spark?
                            
                                output/echo a meesage in hql/ hive query language
                            
                                Difference between a row-oriented and column-oriented databases in dealing information retrieval
                            
                                Spark - How to count number of records by key
                            
                                avro gradle plugin sample usage

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Custom Map Reduce Program on Hive, what's the Rule? How about input and output?

Tags:

hadoop

hive

mapreduce

fahmi

People also ask

1 Answers

Rohan Monga

Recent Activity

Donate For Us