Logo Questions Linux Laravel Mysql Ubuntu Git Menu

Why does word2vec only take one task for mapPartitionsWithIndex at Word2Vec.scala:323

I am running word2vec in spark and when it comes to fit(), only one task is observed in UI as in image:

enter image description here.

As per the configuration, num-executors = 1000, executor-cores = 2. And the RDD coalesces to 2000 partitions. It takes quite a long time for mapPartitionsWithIndex. Can it be distributed to multiple executors or tasks?

like image 619
Addison Avatar asked Mar 08 '23 20:03


1 Answers

setNumPartitions(numPartitions: Int) solves my problem. I did not check the default value. Sets number of partitions (default: 1).

like image 96
Addison Avatar answered Mar 12 '23 06:03