Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark doesnt print outputs on the console within the map function

I have a simple Spark application running on cluster mode.

val funcGSSNFilterHeader = (x: String) => {
    println(!x.contains("servedMSISDN")   
    !x.contains("servedMSISDN")
}

val ssc = new StreamingContext(sc, Seconds(batchIntervalSeconds))
val ggsnFileLines = ssc.fileStream[LongWritable, Text, TextInputFormat]("C:\\Users\\Mbazarganigilani\\Documents\\RA\\GGSN\\Files1", filterF, false)
val ggsnArrays = ggsnFileLines
    .map(x => x._2.toString()).filter(x => funcGSSNFilterHeader(x))

ggsnArrays.foreachRDD(s => {println(x.toString()})

I need to print !x.contains("servedMSISDN") inside the map function for debugging purposes, but this doesn't print on the console

like image 812
Mahdi Avatar asked Feb 06 '23 05:02

Mahdi


1 Answers

Your code contains driver (main/master) and executors (which runs on the nodes in cluster mode).

Functions which runs inside a "map" runs on the executors

i.e. when you are in cluster mode, execution print inside map function will result in print to the nodes console (which you won't see).

In order to debug a program, you can:

  1. Run the code in "local" mode, and the prints in the "map function" will be printed the console of your "master/main node" as the executors are running on the same machine

  2. Replace "print to console" with save to file / save to elastic / etc


Note that in addition to the local vs cluster mode - It seems like you have a typo in your code:

ggsnArrays.foreachRDD(s => {println(x.toString()})

Should be:

ggsnArrays.foreachRDD(s => {println(x.toString)})
like image 74
Yaron Avatar answered Feb 09 '23 00:02

Yaron