Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark Job submitted - Waiting (TaskSchedulerImpl : Initial job not accepted)

API call made to submit the Job. Response states - It is Running

On Cluster UI -

Worker (slave) - worker-20160712083825-172.31.17.189-59433 is Alive

Core 1 out of 2 used

Memory 1Gb out of 6 used

Running Application

app-20160713130056-0020 - Waiting since 5hrs

Cores - unlimited

Job Description of the Application

Active Stage

reduceByKey at /root/wordcount.py:23

Pending Stage

takeOrdered at /root/wordcount.py:26

Running Driver -

stderr log page for driver-20160713130051-0025 

WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

According to Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources Slaves haven't been started - Hence it doesn't have resources.

However in my case - Slave 1 is working

According to Unable to Execute More than a spark Job "Initial job has not accepted any resources" I am using deploy-mode = cluster (not client) Since I have 1 master 1 slave and Submit API is being called via Postman / anywhere

Also the Cluster has available Cores, RAM, Memory - Still Job throws the error as conveyed by the UI

According to TaskSchedulerImpl: Initial job has not accepted any resources; I assigned

~/spark-1.5.0/conf/spark-env.sh

Spark Environment Variables

SPARK_WORKER_INSTANCES=1
SPARK_WORKER_MEMORY=1000m
SPARK_WORKER_CORES=2

Replicated those across the Slaves

sudo /root/spark-ec2/copy-dir /root/spark/conf/spark-env.sh

All the cases in the answer to above question - were applicable still no solution found. Hence because I was working with APIs and Apache SPark - maybe some other assistance is required.

Edited July 18,2016

Wordcount.py - My PySpark application code -

from pyspark import SparkContext, SparkConf

logFile = "/user/root/In/a.txt"

conf = (SparkConf().set("num-executors", "1"))

sc = SparkContext(master = "spark://ec2-54-209-108-127.compute-1.amazonaws.com:7077", appName = "MyApp", conf = conf)
print("in here")
lines = sc.textFile(logFile)
print("text read")
c = lines.count()
print("lines counted")

Error

Starting job: count at /root/wordcount.py:11
16/07/18 07:46:39 INFO scheduler.DAGScheduler: Got job 0 (count at /root/wordcount.py:11) with 2 output partitions
16/07/18 07:46:39 INFO scheduler.DAGScheduler: Final stage: ResultStage 0 (count at /root/wordcount.py:11)
16/07/18 07:46:39 INFO scheduler.DAGScheduler: Parents of final stage: List()
16/07/18 07:46:39 INFO scheduler.DAGScheduler: Missing parents: List()
16/07/18 07:46:39 INFO scheduler.DAGScheduler: Submitting ResultStage 0 (PythonRDD[2] at count at /root/wordcount.py:11), which has no missing parents
16/07/18 07:46:39 INFO storage.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 5.6 KB, free 56.2 KB)
16/07/18 07:46:39 INFO storage.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 3.4 KB, free 59.7 KB)
16/07/18 07:46:39 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on 172.31.17.189:43684 (size: 3.4 KB, free: 511.5 MB)
16/07/18 07:46:39 INFO spark.SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1006
16/07/18 07:46:39 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (PythonRDD[2] at count at /root/wordcount.py:11)
16/07/18 07:46:39 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
16/07/18 07:46:54 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

According to Spark UI showing 0 cores even when setting cores in App,

Spark WebUI states zero cores used and indefinite wait no tasks running. The application is also using NO MEMORY whatsoever during run time or cores and immediately hits a status of waiting when starting

Spark version 1.6.1 Ubuntu Amazon EC2

like image 239
Chaitanya Bapat Avatar asked Jul 13 '16 19:07

Chaitanya Bapat


1 Answers

I also have the same issue. Below are my remarks when it occurs.

1:17:46 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

I noticed that it only occurs during the first query from scala shell where I run something fetching data from hdfs.

When the problem occurs, the webui states that there's not any running applications.

URL: spark://spark1:7077
REST URL: spark://spark1:6066 (cluster mode)
Alive Workers: 4
Cores in use: 26 Total, 26 Used
Memory in use: 52.7 GB Total, 4.0 GB Used
Applications: 0 Running, 0 Completed
Drivers: 0 Running, 0 Completed 
Status: ALIVE

It seems that something fails to start , I can't tell exactly which it is.

However restarting the cluster a second time sets the Applications value to 1 and everything works well.

URL: spark://spark1:7077
REST URL: spark://spark1:6066 (cluster mode)
Alive Workers: 4
Cores in use: 26 Total, 26 Used
Memory in use: 52.7 GB Total, 4.0 GB Used
Applications: 1 Running, 0 Completed
Drivers: 0 Running, 0 Completed
Status: ALIVE

I'm still investigate, this quick workaround can save times till final solution.

like image 102
hajaK Avatar answered Nov 15 '22 07:11

hajaK