Spark: how to use SparkContext.textFile for local file system

Question

I'm just getting started using Apache Spark (in Scala, but the language is irrelevant). I'm using standalone mode and I'll want to process a text file from a local file system (so nothing distributed like HDFS).

According to the documentation of the textFile method from SparkContext, it will

Read a text file from HDFS, a local file system (available on all nodes), or any Hadoop-supported file system URI, and return it as an RDD of Strings.

What is unclear for me is if the whole text file can just be copied to all the nodes, or if the input data should already be partitioned, e.g. if using 4 nodes and a csv file with 1000 lines, have 250 lines on each node.

I suspect each node should have the whole file but I'm not sure.

David Gruzman · Accepted Answer

Each node should contain a whole file. In this case local file system will be logically indistinguishable from the HDFS, in respect to this file.

Spark: how to use SparkContext.textFile for local file system

Tags:

apache-spark

herman

Video Answer

1 Answers

David Gruzman

Recent Activity

Donate For Us

Spark: how to use SparkContext.textFile for local file system

Tags:

apache-spark

herman

Video Answer

1 Answers

David Gruzman

Related questions

Recent Activity

Donate For Us