For each RDD in a DStream how do I convert this to an array or some other typical Java data type?

Tags:

I would like to convert a DStream into an array, list, etc. so I can then translate it to json and serve it on an endpoint. I'm using apache spark, injecting twitter data. How do I preform this operation on the Dstream statuses? I can't seem to get anything to work other than print().

import org.apache.spark._
import org.apache.spark.SparkContext._
import org.apache.spark.streaming._
import org.apache.spark.streaming.twitter._
import org.apache.spark.streaming.StreamingContext._
import TutorialHelper._
object Tutorial {
  def main(args: Array[String]) {

    // Location of the Spark directory 
    val sparkHome = "/opt/spark"

    // URL of the Spark cluster
    val sparkUrl = "local[8]"

    // Location of the required JAR files 
    val jarFile = "target/scala-2.10/tutorial_2.10-0.1-SNAPSHOT.jar"

    // HDFS directory for checkpointing
    val checkpointDir = "/tmp" 

    // Configure Twitter credentials using twitter.txt
    TutorialHelper.configureTwitterCredentials()

    val ssc = new StreamingContext(sparkUrl, "Tutorial", Seconds(1), sparkHome, Seq(jarFile))

    val filters = Array("#americasgottalent", "iamawesome")
    val tweets = TwitterUtils.createStream(ssc, None, filters)

    val statuses = tweets.map(status => status.getText())

    val arry = Array("firstval")
    statuses.foreachRDD {
         arr :+ _.collect()
    }

    ssc.checkpoint(checkpointDir)

    ssc.start()
    ssc.awaitTermination()
  }
}

582

asked Jul 16 '14 05:07

CodingIsAwesome

2 Answers

If your RDD is statuses you can do.

val arr = new ArrayBuffer[String]();
statuses.foreachRDD {
    arr ++= _.collect() //you can now put it in an array or d w/e you want with it
    ...
}

Keep in mind this could end up being way more data than you want in your driver since a DStream can be huge.

150

answered Oct 17 '22 03:10

aaronman

Turns our you were close, but what I ended up looking for is.

statuses.foreachRDD( rdd => {
    for(item <- rdd.collect().toArray) {
        println(item);
    }
})

answered Oct 17 '22 03:10

CodingIsAwesome

Related questions
                            
                                How to write tuple range function in scala?
                            
                                can I get a function from an overloaded method in scala?
                            
                                What is the syntax meaning of "`class declaration head` { val_name : Type => `class body` }"
                            
                                Overloading generic event handlers in Scala
                            
                                scala tuple type composition
                            
                                In Scala how do you write a init function?
                            
                                Java Play2 - Generic templates?
                            
                                Fixtures in Play! 2 for Scala
                            
                                Incremental processing in an akka actor
                            
                                How to fill case class from json with partial data?
                            
                                Generic scala function whose input is a function of variable arity
                            
                                Scala.Either orElse method
                            
                                PlayFramework2 how to get current page's URL in view templates
                            
                                How exactly "case" works in partial functions in Scala?
                            
                                Scala Eclipse file>new has <No Applicable Items>
                            
                                Initializing an actor before being able to handle some other messages
                            
                                Idiomatic Scala for applying functions in a chain if Option(s) are defined
                            
                                SBT throws java.io.FileNotFoundException: (Permission denied) on project folder
                            
                                Reducing options in scala?
                            
                                What are the equivalents of C#'s access modifiers in Java and Scala?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

For each RDD in a DStream how do I convert this to an array or some other typical Java data type?

Tags:

scala

apache-spark

spark-streaming

dstream

CodingIsAwesome

People also ask

2 Answers

aaronman

CodingIsAwesome

Recent Activity

Donate For Us