Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In spark, what does the parameter "minPartitions" works in SparkContext.textFile(path, minPartitions)?

Tags:

apache-spark

In Spark, either SparkContext or JavaSparkContext, there is one parameter which is minPartitions when you call sc.textFile. what does this parameter imply?

like image 212
EdwinGuo Avatar asked Jul 21 '14 17:07

EdwinGuo


1 Answers

minPartitions will be passed to Hadoop's InputFormat.getSplits. The parameter is a hint, so you may get more or less partitions, depending on the Hadoop InputFormat implementation.

like image 193
Daniel Darabos Avatar answered Oct 30 '22 06:10

Daniel Darabos