Why does word2vec only take one task for mapPartitionsWithIndex at Word2Vec.scala:323

Question

I am running word2vec in spark and when it comes to fit(), only one task is observed in UI as in image:

enter image description here .

As per the configuration, num-executors = 1000, executor-cores = 2. And the RDD coalesces to 2000 partitions. It takes quite a long time for mapPartitionsWithIndex. Can it be distributed to multiple executors or tasks?

Addison · Accepted Answer

setNumPartitions(numPartitions: Int) solves my problem. I did not check the default value. Sets number of partitions (default: 1).

Why does word2vec only take one task for mapPartitionsWithIndex at Word2Vec.scala:323

Tags:

scala

apache-spark

word2vec

apache-spark-mllib

Addison

1 Answers

Addison

Recent Activity

Donate For Us

Why does word2vec only take one task for mapPartitionsWithIndex at Word2Vec.scala:323

Tags:

scala

apache-spark

word2vec

apache-spark-mllib

Addison

1 Answers

Addison

Related questions

Recent Activity

Donate For Us