I want to use a RangePartitioner in my Java Spark Application, but I have no clue how to set the two scala parameters scala.math.Ordering<K> evidence$1
and scala.reflect.ClassTag<K> evidence$2
. Can someone give me an example?
Here is the link to the JavaDoc of RangePartitioner (it was no help for me because I'm new to Spark and Scala...):
My Code actually looks like:
JavaPairRDD<Integer, String> partitionedRDD = rdd.partitionBy(new RangePartitioner<Integer, String>(10, rdd, true, evidence$1, evidence$2));
Range Partitioning in Apache Spark Through this method, tuples those have keys within the same range will appear on the same machine. In range partitioner, keys are partitioned based on an ordering of keys. Also, depends on the set of sorted range of keys.
repartition() can be used for increasing or decreasing the number of partitions of a Spark DataFrame. However, repartition() involves shuffling which is a costly operation.
A Partitioner that partitions sortable records by range into roughly equal ranges. The ranges are determined by sampling the content of the RDD passed in.
You can create both the Ordering
and the ClassTag
by calling methods on their companion objects.
These are referred to in java like this: ClassName$.MODULE$.functionName()
One further wrinkle is that the constructor requires a scala RDD, not a java one. You can get the scala RDD from a java PairRDD
by calling rdd.rdd()
final Ordering<Integer> ordering = Ordering$.MODULE$.comparatorToOrdering(Comparator.<Integer>naturalOrder());
final ClassTag<Integer> classTag = ClassTag$.MODULE$.apply(Integer.class);
final RangePartitioner<Integer, String> partitioner = new RangePartitioner<>(
10,
rdd.rdd(), //note the call to rdd.rdd() here, this gets the scala RDD backing the java one
true,
ordering,
classTag);
final JavaPairRDD<Integer, String> partitioned = rdd.partitionBy(partitioner);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With