Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When to use Partitioner class?

Tags:

c#

.net

.net-4.0

Can anyone suggest typical scenarios where Partitioner class introduced in .NET 4.0 can/should be used?

like image 424
Andrew Bezzub Avatar asked Oct 27 '10 09:10

Andrew Bezzub


People also ask

What is partitioner and its usage?

Partitioner controls the partitioning of the keys of the intermediate map-outputs. The key (or a subset of the key) is used to derive the partition, typically by a hash function. The total number of partitions is the same as the number of reduce tasks for the job.

On what basis does partitioner groups the output and send to next stage?

Before reduce phase, partitioning of the map output take place on the basis of the key. Hadoop Partitioning specifies that all the values for each key are grouped together. It also makes sure that all the values of a single key go to the same reducer. This allows even distribution of the map output over the reducer.

What is the relation between the input and output of a mapper partitioner and reducer?

According to the key-value each mapper output is partitioned and records having the same key value go into the same partition (within each mapper), and then each partition is sent to a reducer.

What are the different ways to write custom partitioner class?

A Custom Partitioner can be written by overriding the getPartition method. The getPartition method takes two parameters which is the key and value. In the Reducer, we just need to collect the <key,value> pairs from the Custom Partitioner and write a logic to find the highest age in each flight and print out the result.


1 Answers

The Partitioner class is used to make parallel executions more chunky. If you have a lot of very small tasks to run in parallel the overhead of invoking delegates for each may be prohibitive. By using Partitioner, you can rearrange the workload into chunks and have each parallel invocation work on a slightly larger set. The class abstracts this feature and is able to partition based on the actual conditions of the dataset and available cores.

Example: Imagine you want to run a simple calculation like this in parallel.

Parallel.ForEach(Input, (value, loopState, index) => { Result[index] = value*Math.PI; }); 

That would invoke the delegate for each entry in Input. Doing so would add a bit of overhead to each. By using Partitioner we can do something like this

Parallel.ForEach(Partitioner.Create(0, Input.Length), range => {    for (var index = range.Item1; index < range.Item2; index++) {       Result[index] = Input[index]*Math.PI;    } }); 

This will reduce the number of invokes as each invoke will work on a larger set. In my experience this can boost performance significantly when parallelizing very simple operations.

like image 121
Brian Rasmussen Avatar answered Sep 23 '22 04:09

Brian Rasmussen