I am basically reading from a Kafka source, and dumping each message through to my <code>foreach</code> processor (Thanks Jacek's page for the simple example). If this actually works, i shall actually perform some business logic in the <code>process</code> method here, however, this doesn't work. I believe that the <code>println</code> doesn't work since its running on executors and there is no way for getting those logs back to driver. However, this <code>insert into</code> a temp table should at least work and show me that the messages are actually consumed and processed through to the sink. What am I missing here ? Really looking for a second set of eyes to check my effort here: <pre class="prettyprint lang-scala prettyprint-override"><code> val stream = spark .readStream .format("kafka") .option("kafka.bootstrap.servers", Util.getProperty("kafka10.broker")) .option("subscribe", src_topic) .load() val rec = stream.selectExpr("CAST(value AS STRING) as txnJson").as[(String)] val df = stream.selectExpr("cast (value as string) as json") val writer = new ForeachWriter[Row] { val scon = new SConConnection override def open(partitionId: Long, version: Long) = { true } override def process(value: Row) = { println("++++++++++++++++++++++++++++++++++++" + value.get(0)) scon.executeUpdate("insert into rs_kafka10(miscCol) values("+value.get(0)+")") } override def close(errorOrNull: Throwable) = { scon.closeConnection } } val yy = df.writeStream .queryName("ForEachQuery") .foreach(writer) .outputMode("append") .start() yy.awaitTermination() </code></pre>

Thanks for comments from Harald and others, I found out a couple of things, which led me to achieve normal processing behaviour - <ol> <li>test code with local mode, yarn isnt the biggest help in debugging</li> <li>for some reason, the process method of foreach sink doesnt allow calling other methods. When i put my business logic directly in there, it works.</li> </ol> hope it helps others.

Structured Streaming - Foreach Sink

Tags:

scala

apache-kafka

apache-spark

spark-structured-streaming

I am basically reading from a Kafka source, and dumping each message through to my foreach processor (Thanks Jacek's page for the simple example).

If this actually works, i shall actually perform some business logic in the process method here, however, this doesn't work. I believe that the println doesn't work since its running on executors and there is no way for getting those logs back to driver. However, this insert into a temp table should at least work and show me that the messages are actually consumed and processed through to the sink.

What am I missing here ?

Really looking for a second set of eyes to check my effort here:

 val stream = spark
      .readStream
      .format("kafka")
      .option("kafka.bootstrap.servers", Util.getProperty("kafka10.broker")) 
      .option("subscribe", src_topic) 
      .load()

    val rec = stream.selectExpr("CAST(value AS STRING) as txnJson").as[(String)]

    val df = stream.selectExpr("cast (value as string) as json")

    val writer = new ForeachWriter[Row] {
      val scon = new SConConnection
      override def open(partitionId: Long, version: Long) = {
        true
      }
      override def process(value: Row) = {
        println("++++++++++++++++++++++++++++++++++++" + value.get(0))
        scon.executeUpdate("insert into rs_kafka10(miscCol) values("+value.get(0)+")")
      }
      override def close(errorOrNull: Throwable) = {
        scon.closeConnection
      }
    }


    val yy = df.writeStream
      .queryName("ForEachQuery")
      .foreach(writer)
      .outputMode("append")
      .start()

    yy.awaitTermination()

448

asked May 26 '17 03:05

Raghav

1 Answers

Thanks for comments from Harald and others, I found out a couple of things, which led me to achieve normal processing behaviour -

test code with local mode, yarn isnt the biggest help in debugging
for some reason, the process method of foreach sink doesnt allow calling other methods. When i put my business logic directly in there, it works.

hope it helps others.

173

answered Oct 13 '22 12:10

Raghav

Related questions
                            
                                Defining projection to map to nested case classes
                            
                                JAVA_HOME error with upgrade to Spark 1.3.0
                            
                                Scala - Run-time performance of TypeTags, ClassTags and WeakTypeTags
                            
                                How to run spark interactively in cluster mode
                            
                                Why do we need the From type parameter in Scala's CanBuildFrom
                            
                                lazy implicit val not found
                            
                                Create Custom Cross Validation in Spark ML
                            
                                Self referencing a val during definition in scala
                            
                                Get the size of a resource
                            
                                How do you debug typelevel code?
                            
                                Why won't this Spark sample code load in spark-shell?
                            
                                How do I make a block aware execution context?
                            
                                too many map keys causing out of memory exception in spark
                            
                                How do you use play framework as a library, in a scala project
                            
                                Play Framework PathBindable with Dependency Injection
                            
                                How to early return in Scala [duplicate]
                            
                                Performance of loading parquet files into case classes in Spark
                            
                                Why is there a difference between Java8 and Scala2.12 lambda cache?
                            
                                Load a file from SFTP server into spark RDD
                            
                                Change a variable in the current sbt task scope

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Structured Streaming - Foreach Sink

Tags:

scala

apache-kafka

apache-spark

spark-structured-streaming

Raghav

People also ask

1 Answers

Raghav

Recent Activity

Donate For Us