I am running word2vec in spark and when it comes to fit()
, only one task is observed in UI as in image:
.
As per the configuration, num-executors = 1000, executor-cores = 2
. And the RDD coalesces to 2000 partitions. It takes quite a long time for mapPartitionsWithIndex
. Can it be distributed to multiple executors or tasks?
setNumPartitions(numPartitions: Int)
solves my problem. I did not check the default value.
Sets number of partitions (default: 1).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With