For example, suppose I have the DataFrame: <pre class="prettyprint"><code>var myDF = sc.parallelize(Seq(("one",1),("two",2),("three",3))).toDF("a", "b") </code></pre> I can convert it to a <code>RDD[(String, Int)]</code> with a map: <pre class="prettyprint"><code>var myRDD = myDF.map(r => (r(0).asInstanceOf[String], r(1).asInstanceOf[Int])) </code></pre> Is there a better way to do this, maybe using the DF schema?

Using pattern matching over <code>Row</code>: <pre class="prettyprint"><code>import org.apache.spark.sql.Row myDF.map{case Row(a: String, b: Int) => (a, b)} </code></pre> In Spark 1.6+ you can use <code>Dataset</code> as follows: <pre class="prettyprint"><code>myDF.as[(String, Int)].rdd </code></pre>

Apache Spark: How do I convert a Spark DataFrame to a RDD with type RDD[(Type1,Type2, ...)]?

Tags:

scala

apache-spark

For example, suppose I have the DataFrame:

var myDF = sc.parallelize(Seq(("one",1),("two",2),("three",3))).toDF("a", "b")

I can convert it to a RDD[(String, Int)] with a map:

var myRDD = myDF.map(r => (r(0).asInstanceOf[String], r(1).asInstanceOf[Int]))

Is there a better way to do this, maybe using the DF schema?

203

asked Jan 22 '16 19:01

evan.oman

1 Answers

Using pattern matching over Row:

import org.apache.spark.sql.Row

myDF.map{case Row(a: String, b: Int) => (a, b)}

In Spark 1.6+ you can use Dataset as follows:

myDF.as[(String, Int)].rdd

answered Sep 22 '22 01:09

zero323

Related questions
                            
                                How to define a list of functions of the same arity in Scala?
                            
                                abandon calling `get` on Option and generate compile error
                            
                                Can't use a negative number in named parameters in Scala
                            
                                How to get the product of two RDDs?
                            
                                Are Futures in Scala really functional?
                            
                                Correct way to postpone messages in Akka
                            
                                Using SORM with Play Framework 2.3.8
                            
                                How to show the scheme (including type) of a parquet file from command line or spark shell?
                            
                                Why do I get conflicting cross-version in sbt on one environment but not another?
                            
                                How to convert from Map[String,Any] to (String, String)*
                            
                                Purity, Referential Transparency and State Monad
                            
                                How to specify different application.conf for specs2 tests?
                            
                                How to find source of scala.MatchError?
                            
                                How to get default property values in Spark
                            
                                How to encode categorical features in Apache Spark
                            
                                How to I convert long (currentTimeInMillis) to UTC timestamp?
                            
                                Scala: strange behavior in `for` pattern matching for None case
                            
                                Is there a configuration file for Scala REPL / SBT Console?
                            
                                Scala Slick 3.0 implicit mapping between java8 OffsetDateTime and Timestamp
                            
                                How to submit a Scala job to Spark?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With