Once I have got in Spark some Row class, either Dataframe or Catalyst, I want to convert it to a case class in my code. This can be done by matching <pre class="prettyprint"><code>someRow match {case Row(a:Long,b:String,c:Double) => myCaseClass(a,b,c)} </code></pre> But it becomes ugly when the row has a huge number of columns, say a dozen of Doubles, some Booleans and even the occasional null. I would like just to be able to -sorry- cast Row to myCaseClass. Is it possible, or have I already got the most economical syntax?

DataFrame is simply a type alias of Dataset[Row] . These operations are also referred as “untyped transformations” in contrast to “typed transformations” that come with strongly typed Scala/Java Datasets. The conversion from Dataset[Row] to Dataset[Person] is very simple in spark <code>val DFtoProcess = SQLContext.sql("SELECT * FROM peoples WHERE name='test'")</code> At this point, Spark converts your data into DataFrame = Dataset[Row], a collection of generic Row object, since it does not know the exact type. <pre class="prettyprint"><code>// Create an Encoders for Java class (In my eg. Person is a JAVA class) // For scala case class you can pass Person without .class reference val personEncoder = Encoders.bean(Person.class) val DStoProcess = DFtoProcess.as[Person](personEncoder) </code></pre> Now, Spark converts the <code>Dataset[Row] -> Dataset[Person]</code> type-specific Scala / Java JVM object, as dictated by the class Person. Please refer to below link provided by databricks for further details https://databricks.com/blog/2016/07/14/a-tale-of-three-apache-spark-apis-rdds-dataframes-and-datasets.html

As far as I know you cannot cast a Row to a case class, but I sometimes chose to access the row fields directly, like <pre class="prettyprint"><code>map(row => myCaseClass(row.getLong(0), row.getString(1), row.getDouble(2)) </code></pre> I find this to be easier, especially if the case class constructor only needs some of the fields from the row.

How to convert Row of a Scala DataFrame into case class most efficiently?

Tags:

scala

apache-spark

apache-spark-sql

Once I have got in Spark some Row class, either Dataframe or Catalyst, I want to convert it to a case class in my code. This can be done by matching

someRow match {case Row(a:Long,b:String,c:Double) => myCaseClass(a,b,c)}

But it becomes ugly when the row has a huge number of columns, say a dozen of Doubles, some Booleans and even the occasional null.

I would like just to be able to -sorry- cast Row to myCaseClass. Is it possible, or have I already got the most economical syntax?

461

asked Jan 27 '15 09:01

arivero

2 Answers

DataFrame is simply a type alias of Dataset[Row] . These operations are also referred as “untyped transformations” in contrast to “typed transformations” that come with strongly typed Scala/Java Datasets.

The conversion from Dataset[Row] to Dataset[Person] is very simple in spark

val DFtoProcess = SQLContext.sql("SELECT * FROM peoples WHERE name='test'")

At this point, Spark converts your data into DataFrame = Dataset[Row], a collection of generic Row object, since it does not know the exact type.

// Create an Encoders for Java class (In my eg. Person is a JAVA class) // For scala case class you can pass Person without .class reference val personEncoder = Encoders.bean(Person.class)   val DStoProcess = DFtoProcess.as[Person](personEncoder)

Now, Spark converts the Dataset[Row] -> Dataset[Person] type-specific Scala / Java JVM object, as dictated by the class Person.

Please refer to below link provided by databricks for further details

https://databricks.com/blog/2016/07/14/a-tale-of-three-apache-spark-apis-rdds-dataframes-and-datasets.html

121

answered Sep 28 '22 02:09

Rahul

As far as I know you cannot cast a Row to a case class, but I sometimes chose to access the row fields directly, like

map(row => myCaseClass(row.getLong(0), row.getString(1), row.getDouble(2))

I find this to be easier, especially if the case class constructor only needs some of the fields from the row.

answered Sep 28 '22 00:09

Glennie Helles Sindholt

Related questions
                            
                                How to exclude commons-logging from a scala/sbt/slf4j project?
                            
                                Including null values in an Apache Spark Join
                            
                                Count all occurrences of a char within a string
                            
                                Scala Divide two integers and get a float result
                            
                                How to get around the Scala case class limit of 22 fields?
                            
                                Can a range be matched in Scala?
                            
                                What are the differences between asInstanceOf[T] and (o: T) in Scala?
                            
                                Print the data in ResultSet along with column names
                            
                                What is the difference between = and := in Scala?
                            
                                Filtering a Scala List by type
                            
                                Can't prove that singleton types are singleton types while generating type class instance
                            
                                values, types, kinds,... as an infinite sequence?
                            
                                +- Signs in Generic Declaration in Scala
                            
                                Is it possible to have tuple assignment to variables in Scala? [duplicate]
                            
                                Scala reference equality
                            
                                What's the difference between join and cogroup in Apache Spark
                            
                                How to reference external sbt project from another sbt project?
                            
                                Why couldn't twitter scale by adding servers the way sites like facebook have?
                            
                                What is the === (triple-equals) operator in Scala Koans?
                            
                                Typesafe config: Load additional config from path external to packaged scala application

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With