Spark : how can evenly distribute my records in all partition

Question

I have a RDD with 30 record (key/value pair : key is Time Stamp and Value is JPEG Byte Array)
and I am running 30 executors. I want to repartition this RDD in to 30 partitions so every partition gets one record and is assigned to one executor.

When I used rdd.repartition(30) it repartitions my rdd in 30 partitions but some partitions get 2 records, some get 1 record and some not getting any records.

Is there any way in Spark I can evenly distribute my records to all partitions.

devesh · Accepted Answer

Salting technique can be used which involves adding a new "fake" key and using alongside the current key for better distribution of data.

(here is link for salting)

Spark : how can evenly distribute my records in all partition

Tags:

apache-spark

prateek arora

1 Answers

devesh

Recent Activity

Donate For Us

Spark : how can evenly distribute my records in all partition

Tags:

apache-spark

prateek arora

1 Answers

devesh

Related questions

Recent Activity

Donate For Us