Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Luigi fails to finish all tasks listed in the require method

Tags:

luigi

Say I have a task with the following dependency structure

class ParentTask(luigi.Task):
    def requires(self):
        return [ChildTask(classLevel=x) for x in self.class_level_list]
    def run(self):
        yadayda

The child task runs fine on it own. The parent correctly checks all the children tasks for finish status. Yet when the first child task finishes, the scheduler mark the parent task as finished. with the following message:

   Scheduled 15 tasks of which:
* 3 ran successfully:
    - 1 CleanRecord(...)
    - 1 EstimateQuestionParameter(classLevel=6, qdt=2016-04-19, subject=english)
    - 1 GetLog(classLevel=6, qdt=2016-04-19, subject=english)
* 12 were left pending, among these:
    * 12 were left pending because of unknown reason:
        - 5 EstimateQuestionParameter(classLevel=1...5, qdt=2016-04-19, subject=english)
        - 5 GetLog(pool=None, classLevel=1...5, qdt=2016-04-19, subject=english)
        - 1 UpdateQuestionParameter(qdt=2016-04-19, lastQdt=2016-03-23, subject=english, isInit=False)
        - 1 UpdateQuestionParameterBuffer(qdt=2016-04-19, subject=english, src_table=edw.edw_behavior_question_record_exam_new)

This progress looks :) because there were no failed tasks or missing external dependencies
like image 647
Junchen Avatar asked Apr 21 '16 06:04

Junchen


1 Answers

I think this happens because your worker gets disconnected from the scheduler. The worker's heartbeats don't reach scheduler because of network partition or, more likely, because they're never sent due to this issue.

You have two options to work-around the problem:

  • Increase worker-disconnect-delay setting ([scheduler] section in config, default 60s)
  • Use more than one worker for your job, e.g. --workers 2 (if it's the latter reason)
like image 126
Jakub Kukul Avatar answered Oct 25 '22 03:10

Jakub Kukul