Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating data frame out of sequence using toDF method in Apache Spark

I use Spark 2.4.4 and try to get a data frame given below.

val spark =  SparkSession
            .builder
            .master("local[*]")
            .appName("App")
            .getOrCreate 

import spark.sqlContext.implicits._  
import spark.implicits._

val justNow = spark.sparkContext.parallelize( 
        Seq(Row("1", "One")
           ,Row("2", "Tow")
        )
).toDF

I have the above piece of code defined inside main method. But I am getting an error that toDF is not function defined in RDD. I referred other posts on stackoverflow to include the explicits to get rid of the errors. I am still getting it.

error: value toDF is not a member of org.apache.spark.rdd.RDD[org.apache.spark.sql.Row]
possible cause: maybe a semicolon is missing before `value toDF'?
Error occurred in an application involving default arguments. 

Can someone please help. Thanks!

like image 664
user3103957 Avatar asked Oct 16 '25 01:10

user3103957


1 Answers

You can use the createDataFrame method instead. toDF is not suitable for RDD of Rows.

import org.apache.spark.sql.types._
import org.apache.spark.sql.Row

val schema = StructType(Seq(StructField("col1",StringType), StructField("col2",StringType)))
val df = spark.createDataFrame(sc.parallelize(Seq(Row("1", "One"),Row("2", "Tow"))), schema)

df.show
+----+----+
|col1|col2|
+----+----+
|   1| One|
|   2| Tow|
+----+----+
like image 178
mck Avatar answered Oct 17 '25 21:10

mck



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!