Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Long running Azure Webjob - Keeps stopping

I have been working with Azure webjobs for a while now but am still struggling to figure out why some extremely long running web job fails..

I have webjob which is built using the Azure SDK and get's triggered from a queue message coming in. This web job pulls a blob of XML from Azure Blob storage which contains information about 110000 items, it then loops through these and using REST WebApi2 endpoints makes various HttpClient requests to create the various entities in both our table storage and DocumentDB... the process is slooooow something I'm working on, but it runs for days... which is fine as there is no urgency, apart from it keeps randomly just stopping, sometimes after two days... the last time the only message was "Thread was being aborted". I making regular logging out and http calls so it's not like the job is sat there doing nothing... UPDATE:

I should also state I have upgraded the whole app service plan to S1 and set the web app hosting the web job to Always On...

I have also looked at "WEBJOBS_RESTART_TIME" but this is not relevant as it's about restarting after stopping, something I assumed a continuous job with no error shouldn't do!

like image 477
dreadeddev Avatar asked Mar 21 '16 10:03

dreadeddev


2 Answers

There is one thing with long running webjobs that I found out. For the thread to continue working for really long periods of time you either have to:

  • Write to the output with Console.Write every now and then
  • Add an App Setting called WEBJOBS_IDLE_TIMEOUT (source) which defines to amount of time that the environment will wait for an Idle webjob (no Console output) before shutting it down.

I'd do both, add the Console.Write like a "heartbeat" and add the App Setting.

like image 125
Matias Quaranta Avatar answered Oct 05 '22 01:10

Matias Quaranta


We have had many problems with long-running web jobs and ended up buying paid support because the jobs would fail so often and we just couldn't figure out why. This has been going on for 2+ months and there is still no resolution. They did recommend using the local_cache setting which for a while stopped the reboots but eventually, the reboots started again.

We had moved these off of a vm where they had run flawlessly for years. In my opinion, web jobs just aren't suitable for long running jobs and you should move to a vm. We have a number of short running jobs and they do just fine but for anything long running, I think that web jobs are not ready for prime time. We have spent a lot of time with support on these issues to no avail and frankly, we feel that we're just wasting our time at this point. Save yourself the pain and go to a vm and revisit this in 6 months.

like image 43
JonnyBravoJr Avatar answered Oct 04 '22 23:10

JonnyBravoJr