I am working with Spark 2.0 Scala. I am able to convert an RDD to a DataFrame using the toDF() method.
val rdd = sc.textFile("/pathtologfile/logfile.txt")
val df = rdd.toDF()
But for the life of me I cannot find where this is in the API docs. It is not under RDD. But it is under DataSet (link 1). However I have an RDD not a DataSet.
Also I can't see it under implicits (link 2).
So please help me understand why toDF() can be called for my RDD. Where is this method being inherited from?
toDF() toDF() method provides a very concise way to create a Dataframe. This method can be applied to a sequence of objects. To access the toDF() method, we have to import spark.
Converting Spark RDD to DataFrame can be done using toDF(), createDataFrame() and transforming rdd[Row] to the data frame.
Converting Typed Dataset to Untyped DataFrame — toDF Basic Action. toDF converts a Dataset into a DataFrame. Internally, the empty-argument toDF creates a Dataset[Row] using the Dataset 's SparkSession and QueryExecution with the encoder being RowEncoder.
It's coming from here:
Spark 2 API
Explanation: if you import sqlContext.implicits._
, you have a implicit method to convert RDD
to DataSetHolder
(rddToDataSetHolder
), then you call toDF
on the DataSetHolder
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With