I have a RDD with 30 record (key/value pair : key is Time Stamp and Value is JPEG Byte Array)
and I am running 30 executors. I want to repartition this RDD in to 30 partitions so every partition gets one record and is assigned to one executor.
When I used rdd.repartition(30)
it repartitions my rdd in 30 partitions but some partitions get 2 records, some get 1 record and some not getting any records.
Is there any way in Spark I can evenly distribute my records to all partitions.
Salting technique can be used which involves adding a new "fake" key and using alongside the current key for better distribution of data.
(here is link for salting)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With