Sorting by value in Hadoop from a file

Tags:

I have a file containing a String, then a space and then a number on every line.

Example:

Line1: Word 2
Line2 : Word1 8
Line3: Word2 1

I need to sort the number in descending order and then put the result in a file assigning a rank to the numbers. So my output should be a file containing the following format:

Line1: Word1 8 1
Line2: Word  2 2
Line3: Word2 1 3

Does anyone has an idea, how can I do it in Hadoop? I am using java with Hadoop.

395

asked Nov 27 '11 22:11

2 Answers

You could organize your map/reduce computation like this:

Map input: default

Map output: "key: number, value: word"

_ sorting phase by key _

Here you will need to override the default sorter to sort in decreasing order.

Reduce - 1 reducer

Reduce input: "key: number, value: word"

Reduce output: "key: word, value: (number, rank)"

Keep a global counter. For each key-value pair add the rank by incrementing the counter.

Edit: Here is a code snipped of a custom descendant sorter:

public static class IntComparator extends WritableComparator {

    public IntComparator() {
        super(IntWritable.class);
    }

    @Override
    public int compare(byte[] b1, int s1, int l1,
            byte[] b2, int s2, int l2) {

        Integer v1 = ByteBuffer.wrap(b1, s1, l1).getInt();
        Integer v2 = ByteBuffer.wrap(b2, s2, l2).getInt();

        return v1.compareTo(v2) * (-1);
    }
}

Don't forget to actually set it as the comparator for your job:

job.setSortComparatorClass(IntComparator.class);

148

answered Sep 21 '22 00:09

Tudor

Hadoop Streaming - Hadoop 1.0.x

According to this, after the

bin/hadoop jar contrib/streaming/hadoop-streaming-1.0.*.jar

you add a comparator

-D mapred.output.key.comparator.class=org.apache.hadoop.mapred.lib.KeyFieldBasedComparator
you specify the kind of sorting you want

-D mapred.text.key.comparator.options=-[ options]

where the [ options] are similar to Unix sort. Here are some examples,

Reverse order

-D mapred.text.key.comparator.options=-r

Sort on numeric values

-D mapred.text.key.comparator.options=-n

Sort on value or whatever field

-D mapred.text.key.comparator.options=-kx,y

with the -k flag you specify the key of sorting. The x, y parameters define this key. So, if you have a line with more than one tokens, you can choose which token of all will be the key of sorting or which combination of tokens will be the key of sorting. See the references for more details and examples.

answered Sep 20 '22 00:09

vpap

Related questions
                            
                                Check if a JTextField is a number
                            
                                Adding enum type to a list
                            
                                Clarification on overloading
                            
                                Where are Websphere database configuration files saved?
                            
                                How to use SpannableString with Regex in android?
                            
                                Integer to byte casting in Java
                            
                                What would be the right way to declare qualifiers in java [duplicate]
                            
                                Order of listeners in Java
                            
                                Implementing the equals method in java
                            
                                how to maintain variable cookies and sessions with jsoup?
                            
                                Java graphics - Stages of a game
                            
                                Accessing my custom user object in jsp page, using spring 3 security
                            
                                Adding User/Password to SOAPHeader for WebService client call with AXIS2
                            
                                Java, how can I avoid "might not have been initialized"
                            
                                Tomcat doing 15K req/second on a single server using Jersey Jax-RS
                            
                                Why is there a Java radix limit?
                            
                                regular expression java
                            
                                Declaring masks for bitwise operations
                            
                                A twisted inner class in Java
                            
                                Resizing an image in swing

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Sorting by value in Hadoop from a file

Tags:

java

hadoop

hadoop-streaming

Deepika Sethi

People also ask

2 Answers

Tudor

vpap

Recent Activity

Donate For Us