Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

'Connection Refused' error while running Spark Streaming on local machine

I know there are many threads already on 'spark streaming connection refused' issues. But most of these are in Linux or at least pointing to HDFS. I am running this on my local laptop with Windows.

I am running a very simple basic Spark streaming standalone application, just to see how the streaming works. Not doing anything complex here:-

import org.apache.spark.streaming.Seconds
import org.apache.spark.streaming.StreamingContext
import org.apache.spark.SparkConf

object MyStream 
{
    def main(args:Array[String]) 
    {
        val sc = new StreamingContext(new SparkConf(),Seconds(10))
        val mystreamRDD = sc.socketTextStream("localhost",7777)
        mystreamRDD.print()
        sc.start()
        sc.awaitTermination()
    }
}

I am getting the following error:-

2015-07-25 18:13:07 INFO  ReceiverSupervisorImpl:59 - Starting receiver
2015-07-25 18:13:07 INFO  ReceiverSupervisorImpl:59 - Called receiver onStart
2015-07-25 18:13:07 INFO  SocketReceiver:59 - Connecting to localhost:7777
2015-07-25 18:13:07 INFO  ReceiverTracker:59 - Registered receiver for      stream 0 from 192.168.19.1:11300
2015-07-25 18:13:08 WARN  ReceiverSupervisorImpl:92 - Restarting receiver     with delay 2000 ms: Error connecting to localhost:7777
java.net.ConnectException: Connection refused

I have tried using different port numbers, but it doesn't help. So it keeps retrying in loop and keeps on getting same error. Does anyone have an idea?

like image 494
Dhiraj Avatar asked Jul 26 '15 01:07

Dhiraj


2 Answers

Make sure to start the netcat or the port connection before you run the program. nc -lk 8080

like image 75
JayaChandra S Reddy Avatar answered Sep 17 '22 06:09

JayaChandra S Reddy


Within the code for socketTextStream, Spark creates an instance of SocketInputDStream which uses java.net.Socket https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/dstream/SocketInputDStream.scala#L73

java.net.Socket is a client socket, which means it is expecting there to be a server already running at the address and port you specify. Unless you have some service running a server on port 7777 of your local machine, the error you are seeing is as expected.

To see what I mean, try the following (you may not need to set master or appName in your environment).

import org.apache.spark.streaming.Seconds
import org.apache.spark.streaming.StreamingContext
import org.apache.spark.SparkConf

object MyStream
{
  def main(args:Array[String])
  {
    val sc = new StreamingContext(new SparkConf().setMaster("local").setAppName("socketstream"),Seconds(10))
    val mystreamRDD = sc.socketTextStream("bbc.co.uk",80)
    mystreamRDD.print()
    sc.start()
    sc.awaitTermination()
  }
}

This doesn't return any content because the app doesn't speak HTTP to the bbc website but it does not get a connection refused exception.

To run a local server when on linux, I would use netcat with a simple command such as

cat data.txt | ncat -l -p 7777

I'm not sure what your best approach is in Windows. You could write another application which listens as a server on that port and sends some data.

like image 39
mattinbits Avatar answered Sep 18 '22 06:09

mattinbits