What is a crashloop?

Question

I'm reading Google's Site Reliability Engineering book and ran across the word crashloop which I've never heard before and have not been able to locate a definition

"If a task tries to use more resources than it requested, Borg kills the task and restarts it (as a slowly crashlooping task is usually preferable to a task that hasn’t been restar‐ ted at all)."

What is a crashloop and how does it compare to an infinite loop if at all?

GManNickG · Accepted Answer

A crashloop is when a process crashes and is restarted by a watchdog daemon, indefinitely.

That is, the history is:

Process starts at time T.
Process crashes at time T+1.
Watchdog daemon restarts process.
Process started at time T+2.
Process crashes at time T+3.
Watchdog daemon restarts process.
Process starts...etc.

Here, the watchdog deamon is Borg, and the process is encapsulated into a task.

In general, in distributed computing if you want something to eventually succeed, you have to write down your intent for it to be completed and you need a worker to loop continually to act on this intent. This is "at least once delivery" of a work item.

Here, the intent is that the task runs (written down into Borg), and Borg itself is running the loop that is constantly trying to make sure the task runs. This is why when a task crashes, it is restarted. When a task crashes repeatedly, together you end up with a crashloop.

What is a crashloop?

Tags:

sysadmin

crash

distributed-system

reliability

Taylor Clark

1 Answers

GManNickG

Recent Activity

Donate For Us

What is a crashloop?

Tags:

sysadmin

crash

distributed-system

reliability

Taylor Clark

1 Answers

GManNickG

Related questions

Recent Activity

Donate For Us