Convert scala list to DataFrame or DataSet

Tags:

I am new to Scala. I am trying to convert a scala list (which is holding the results of some calculated data on a source DataFrame) to Dataframe or Dataset. I am not finding any direct method to do that. However, I have tried the following process to convert my list to DataSet but it seems not working. I am providing the 3 situations below.

Can someone please provide me some ray of hope, how to do this conversion? Thanks.

import org.apache.spark.sql.{DataFrame, Row, SQLContext, DataFrameReader}
import java.sql.{Connection, DriverManager, ResultSet, Timestamp}
import scala.collection._

case class TestPerson(name: String, age: Long, salary: Double)
var tom = new TestPerson("Tom Hanks",37,35.5)
var sam = new TestPerson("Sam Smith",40,40.5)

val PersonList = mutable.MutableList[TestPerson]()

//Adding data in list
PersonList += tom
PersonList += sam

//Situation 1: Trying to create dataset from List of objects:- Result:Error
//Throwing error
var personDS = Seq(PersonList).toDS()
/*
ERROR:
error: Unable to find encoder for type stored in a Dataset.  Primitive types
   (Int, String, etc) and Product types (case classes) are supported by     
importing sqlContext.implicits._  Support for serializing other types will  
be added in future releases.
     var personDS = Seq(PersonList).toDS()

*/
//Situation 2: Trying to add data 1-by-1 :- Result: not working as desired.    
the last record overwriting any existing data in the DS
var personDS = Seq(tom).toDS()
personDS = Seq(sam).toDS()

personDS += sam //not working. throwing error


//Situation 3: Working. However, I am having consolidated data in the list    
which I want to convert to DS; if I loop the results of the list in comma  
separated values and then pass that here, it will work but will create an  
extra loop in the code, which I want to avoid.
var personDS = Seq(tom,sam).toDS()
scala> personDS.show()
+---------+---+------+
|     name|age|salary|
+---------+---+------+
|Tom Hanks| 37|  35.5|
|Sam Smith| 40|  40.5|
+---------+---+------+

777

asked Sep 08 '16 18:09

Leo

2 Answers

Try without Seq:

case class TestPerson(name: String, age: Long, salary: Double)
val tom = TestPerson("Tom Hanks",37,35.5)
val sam = TestPerson("Sam Smith",40,40.5)
val PersonList = mutable.MutableList[TestPerson]()
PersonList += tom
PersonList += sam

val personDS = PersonList.toDS()
println(personDS.getClass)
personDS.show()

val personDF = PersonList.toDF()
println(personDF.getClass)
personDF.show()
personDF.select("name", "age").show()

Output:

class org.apache.spark.sql.Dataset

+---------+---+------+
|     name|age|salary|
+---------+---+------+
|Tom Hanks| 37|  35.5|
|Sam Smith| 40|  40.5|
+---------+---+------+

class org.apache.spark.sql.DataFrame

+---------+---+------+
|     name|age|salary|
+---------+---+------+
|Tom Hanks| 37|  35.5|
|Sam Smith| 40|  40.5|
+---------+---+------+

+---------+---+
|     name|age|
+---------+---+
|Tom Hanks| 37|
|Sam Smith| 40|
+---------+---+

Also, make sure to move the declaration of the case class TestPerson outside the scope of your object.

answered Sep 20 '22 15:09

Ajeet Shah

case class TestPerson(name: String, age: Long, salary: Double)

val spark = SparkSession.builder().appName("List to Dataset").master("local[*]").getOrCreate()

var tom = new TestPerson("Tom Hanks",37,35.5)
var sam = new TestPerson("Sam Smith",40,40.5)
   
// mutable.MutableList[TestPerson]() is not required , i used below way which was 
// cleaner
val PersonList =  List(tom,sam)

import spark.implicits._
PersonList.toDS().show

answered Sep 16 '22 15:09

kartik kudada

Related questions
                            
                                Transforming expression given in prefix notation, identifying common subexpressions and dependencies
                            
                                Why this erasure warning with member variables declared as a tuple?
                            
                                creating a new instance of a type in scala
                            
                                How could I know if a database table is exists in ScalaQuery
                            
                                Findig the 2nd last item in the list, please explain this solution
                            
                                Reading Multiple Inputs from the Same Line the Scala Way
                            
                                Scala Predef unimport [duplicate]
                            
                                What is Scala's Comparable trait?
                            
                                "Deinterlacing" a list in Scala
                            
                                Scala Play 2, passing request to method
                            
                                Lifting functions in scala
                            
                                Error: Could not retrieve sbt 0.11.3 when running existing project in Play 2.1.0?
                            
                                Handling freeform GET URL parameters in Play 2 routing
                            
                                Play JSON Combinators
                            
                                spray Marshaller for futures not in implicit scope after upgrading to spray 1.2
                            
                                Passing a Shapeless Extensible Record to a Function (continued)
                            
                                Naked 'extends' keyword
                            
                                Shapeless HList type checking
                            
                                could not find implicit value for parameter system: akka.actor.ActorSystem
                            
                                Basic Slick Insert Example

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Convert scala list to DataFrame or DataSet

Tags:

scala

apache-spark

apache-spark-sql

apache-spark-dataset

apache-spark-encoders

Leo

People also ask

2 Answers

Ajeet Shah

kartik kudada

Recent Activity

Donate For Us