Difference between sc.textFile and spark.read.text in Spark

Question

I am trying to read a simple text file into a Spark RDD and I see that there are two ways of doing so :

from pyspark.sql import SparkSession
spark = SparkSession.builder.master("local[*]").getOrCreate()
sc = spark.sparkContext
textRDD1 = sc.textFile("hobbit.txt")
textRDD2 = spark.read.text('hobbit.txt').rdd

then I look into the data and see that the two RDDs are structured differently

textRDD1.take(5)

['The king beneath the mountain',
 'The king of carven stone',
 'The lord of silver fountain',
 'Shall come unto his own',
 'His throne shall be upholden']

textRDD2.take(5)

[Row(value='The king beneath the mountain'),
 Row(value='The king of carven stone'),
 Row(value='The lord of silver fountain'),
 Row(value='Shall come unto his own'),
 Row(value='His throne shall be upholden')]

Based on this, all subsequent processing has to be changed to reflect the presence of the 'value'

My questions are

What is the implication of using these two ways of reading a text file?
Under what circumstances should we use which method?

philantrovert · Accepted Answer

To answer (a),

sc.textFile(...) returns a RDD[String]

textFile(String path, int minPartitions)
Read a text file from HDFS, a local file system (available on all nodes), or any Hadoop-supported file system URI, and return it as an RDD of Strings.

spark.read.text(...) returns a DataSet[Row] or a DataFrame

text(String path)
Loads text files and returns a DataFrame whose schema starts with a string column named "value", and followed by partitioned columns if there are any.

For (b), it really depends on your use case. Since you are trying to create a RDD here, you should go with sc.textFile. You can always convert a dataframe to a rdd and vice-versa.

Difference between sc.textFile and spark.read.text in Spark

Tags:

apache-spark

rdd

Calcutta

1 Answers

philantrovert

Recent Activity

Donate For Us

Difference between sc.textFile and spark.read.text in Spark

Tags:

apache-spark

rdd

Calcutta

1 Answers

philantrovert

Related questions

Recent Activity

Donate For Us