Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark 2.0 Scala - RDD.toDF()

Tags:

I am working with Spark 2.0 Scala. I am able to convert an RDD to a DataFrame using the toDF() method.

val rdd = sc.textFile("/pathtologfile/logfile.txt")
val df = rdd.toDF()

But for the life of me I cannot find where this is in the API docs. It is not under RDD. But it is under DataSet (link 1). However I have an RDD not a DataSet.

Also I can't see it under implicits (link 2).

So please help me understand why toDF() can be called for my RDD. Where is this method being inherited from?

like image 634
Carl Avatar asked Aug 16 '16 06:08

Carl


People also ask

What is spark toDF?

toDF() toDF() method provides a very concise way to create a Dataframe. This method can be applied to a sequence of objects. To access the toDF() method, we have to import spark.

How you will convert RDD into data frame and datasets?

Converting Spark RDD to DataFrame can be done using toDF(), createDataFrame() and transforming rdd[Row] to the data frame.

Is toDF an action?

Converting Typed Dataset to Untyped DataFrame — toDF Basic Action. toDF converts a Dataset into a DataFrame. Internally, the empty-argument toDF creates a Dataset[Row] using the Dataset 's SparkSession and QueryExecution with the encoder being RowEncoder.


1 Answers

It's coming from here:

Spark 2 API

Explanation: if you import sqlContext.implicits._, you have a implicit method to convert RDD to DataSetHolder (rddToDataSetHolder), then you call toDF on the DataSetHolder

like image 112
Raphael Roth Avatar answered Oct 13 '22 22:10

Raphael Roth