My problem is sorting values in a file. keys and values are integers and need to maintain the keys of sorted values. <pre class="prettyprint"><code>key value 1 24 3 4 4 12 5 23 </code></pre> output: <pre class="prettyprint"><code>1 24 5 23 4 12 3 4 </code></pre> I am working with massive data and must run the code in a cluster of hadoop machines. How can i do it with mapreduce?

You can probably do this (I'm assuming you are using Java here) From maps emit like this - <pre class="prettyprint"><code>context.write(24,1); context.write(4,3); context.write(12,4) context.write(23,5) </code></pre> So, all you values that needs to be sorted should be the key in your mapreduce job. Hadoop by default sorts by ascending order of key. Hence, either you do this to sort in descending order, <pre class="prettyprint"><code>job.setSortComparatorClass(LongWritable.DecreasingComparator.class); </code></pre> Or, this, You need to set a custom Descending Sort Comparator, which goes something like this in your job. <pre class="prettyprint lang-java prettyprint-override"><code>public static class DescendingKeyComparator extends WritableComparator { protected DescendingKeyComparator() { super(Text.class, true); } @SuppressWarnings("rawtypes") @Override public int compare(WritableComparable w1, WritableComparable w2) { LongWritable key1 = (LongWritable) w1; LongWritable key2 = (LongWritable) w2; return -1 * key1.compareTo(key2); } } </code></pre> The suffle and sort phase in Hadoop will take care of sorting your keys in descending order 24,4,12,23 After comment: If you require a Descending IntWritable Comparable, you can create one and use it like this - <pre class="prettyprint"><code>job.setSortComparatorClass(DescendingIntComparable.class); </code></pre> In case if you are using JobConf, use this to set <pre class="prettyprint"><code>jobConfObject.setOutputKeyComparatorClass(DescendingIntComparable.class); </code></pre> Put the following code below your <code>main()</code> function - <pre class="prettyprint lang-java prettyprint-override"><code>public static void main(String[] args) { int exitCode = ToolRunner.run(new YourDriver(), args); System.exit(exitCode); } //this class is defined outside of main not inside public static class DescendingIntWritableComparable extends IntWritable { /** A decreasing Comparator optimized for IntWritable. */ public static class DecreasingComparator extends Comparator { public int compare(WritableComparable a, WritableComparable b) { return -super.compare(a, b); } public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) { return -super.compare(b1, s1, l1, b2, s2, l2); } } } </code></pre>

How to implement sort in hadoop?

Tags:

sorting

hadoop

mapreduce

My problem is sorting values in a file. keys and values are integers and need to maintain the keys of sorted values.

key   value
1     24
3     4
4     12
5     23

output:

I am working with massive data and must run the code in a cluster of hadoop machines. How can i do it with mapreduce?

251

asked Aug 09 '13 19:08

csperson

1 Answers

You can probably do this (I'm assuming you are using Java here)

From maps emit like this -

context.write(24,1);
context.write(4,3);
context.write(12,4)
context.write(23,5)

So, all you values that needs to be sorted should be the key in your mapreduce job. Hadoop by default sorts by ascending order of key.

Hence, either you do this to sort in descending order,

job.setSortComparatorClass(LongWritable.DecreasingComparator.class);

Or, this,

You need to set a custom Descending Sort Comparator, which goes something like this in your job.

public static class DescendingKeyComparator extends WritableComparator {
    protected DescendingKeyComparator() {
        super(Text.class, true);
    }

    @SuppressWarnings("rawtypes")
    @Override
    public int compare(WritableComparable w1, WritableComparable w2) {
        LongWritable key1 = (LongWritable) w1;
        LongWritable key2 = (LongWritable) w2;          
        return -1 * key1.compareTo(key2);
    }
}

The suffle and sort phase in Hadoop will take care of sorting your keys in descending order 24,4,12,23

After comment:

If you require a Descending IntWritable Comparable, you can create one and use it like this -

job.setSortComparatorClass(DescendingIntComparable.class);

In case if you are using JobConf, use this to set

jobConfObject.setOutputKeyComparatorClass(DescendingIntComparable.class);

Put the following code below your main() function -

public static void main(String[] args) {
    int exitCode = ToolRunner.run(new YourDriver(), args);
    System.exit(exitCode);
}

//this class is defined outside of main not inside
public static class DescendingIntWritableComparable extends IntWritable {
    /** A decreasing Comparator optimized for IntWritable. */ 
    public static class DecreasingComparator extends Comparator {
        public int compare(WritableComparable a, WritableComparable b) {
            return -super.compare(a, b);
        }
        public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {
            return -super.compare(b1, s1, l1, b2, s2, l2);
        }
    }
}

101

answered Nov 22 '22 21:11

SSaikia_JtheRocker

Related questions
                            
                                Chaining of ordering predicates (e.g. for std::sort)
                            
                                How to remove duplicated records\observations WITHOUT sorting in SAS？
                            
                                Does sorting by a non transitive comparator "work"?
                            
                                How to sort MySQL results alphabetically but with search term match first using CodeIgniter ActiveRecord?
                            
                                Sort a collection and rank the result based on certain criteria
                            
                                Sort array using multiple criteria in PHP [duplicate]
                            
                                Sort tuples by first element reverse, second element regular
                            
                                Does partition function gives quick sort its locality of reference?
                            
                                Group sequences of values
                            
                                Sort Arrays in Array in Lua
                            
                                Can the Duplicate Characters in a string be Identified and Quantified in O(n)?
                            
                                C#: Is a SortedDictionary sorted when you enumerate over it?
                            
                                How to order by maximum of two column which can be null in MySQL?
                            
                                Sorting a HashMap based on Value then Key? [duplicate]
                            
                                Sorting Erlang records in a list?
                            
                                Specific sort a list of numbers separated by dots
                            
                                Sorting Counter collection in python with secondary term (tie breaker)
                            
                                C - Sort float array while keeping track of indices
                            
                                Sort a pandas dataframe based on DateTime field
                            
                                sorting array after array_count_values

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With