How to convert RDD to DataFrame in Spark Streaming, not just Spark

Tags:

How can I convert RDD to DataFrame in Spark Streaming, not just Spark?

I saw this example, but it requires SparkContext.

val sqlContext = new SQLContext(sc) 
import sqlContext.implicits._
rdd.toDF()

In my case I have StreamingContext. Should I then create SparkContext inside foreach? It looks too crazy... So, how to deal with this issue? My final goal (if it might be useful) is to save the DataFrame in Amazon S3 using rdd.toDF.write.format("json").saveAsTextFile("s3://iiiii/ttttt.json");, which is not possible for RDD without converting it to DataFrame (as I know).

myDstream.foreachRDD { rdd =>
    val conf = new SparkConf().setMaster("local").setAppName("My App")
    val sc = new SparkContext(conf)
    val sqlContext = new SQLContext(sc) 
    import sqlContext.implicits._
    rdd.toDF()
}

386

asked Oct 12 '16 10:10

Lobsterrrr

1 Answers

Create sqlContext outside foreachRDD ,Once you convert the rdd to DF using sqlContext, you can write into S3.

For example:

val conf = new SparkConf().setMaster("local").setAppName("My App")
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc) 
import sqlContext.implicits._
myDstream.foreachRDD { rdd =>

    val df = rdd.toDF()
    df.write.format("json").saveAsTextFile("s3://iiiii/ttttt.json")
}

Update:

Even you can create sqlContext inside foreachRDD which is going to execute on Driver.

138

answered Sep 20 '22 16:09

Shankar

Related questions
                            
                                Scala Websocket Client?
                            
                                Scala ambiguity with paren-less function calls
                            
                                How to setup IntelliJ to recognize Scala's "???" method as a TODO
                            
                                Surprising equivalences and non-equivalences regarding this.type
                            
                                Scala Play application integration tests with guice context
                            
                                Slick 3.0 how to update variable column list, which number is know only in Runtime
                            
                                Spark JoinWithCassandraTable on TimeStamp partition key STUCK
                            
                                Could not find implicit value for parameter lgen: shapeless.LabelledGeneric.Aux
                            
                                Linking to external Scala API docs in IntelliJ
                            
                                How to use multiple versions of a library in Scala?
                            
                                Does type inference slow down auto-completion in the IDE
                            
                                Spark mapWithState shuffles all data to one node
                            
                                How to integrate Play (web framework), Deadbolt (authorization) and Slick (database access)
                            
                                ssl - Error self signed certificate getting chain
                            
                                Get ParameterizedType from scala's Type?
                            
                                how to ignore test utility methods when scalatest detects failures?
                            
                                How to give predicted and label columns in BinaryClassificationMetrics evaluation for Naive Bayes model
                            
                                Scala one-liner to generate MD5 Hash from string
                            
                                Scala Play Json JSResultException Validation Error
                            
                                Prediction.io - pio train fails

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to convert RDD to DataFrame in Spark Streaming, not just Spark

Tags:

scala

apache-spark

rdd

spark-streaming

Lobsterrrr

People also ask

1 Answers

Shankar

Recent Activity

Donate For Us