Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Azure function visibilityTimeout

When i read the documentation about visibilityTimeout: https://docs.microsoft.com/en-us/azure/azure-functions/functions-bindings-storage-queue#host-json it says "The time interval between retries when processing of a message fails.". How I understands this is that if the timeout is set to 30 seconds and my function runs for 1 minute but doesn't fail in that 1 minute period, the message doesn't get visible to others in the queue. But when I read up on it by others sources (stackoverflow fx) it tells me the opposite, that when the execution time of the function exceeds the timeout, the message becomes visible EVEN though the function is still processing the message.

What is the truth? Is the timeout only relevant when the function isn't running more (and maybe have failed) or can it happen that the message gets visible again even though the function is still running?

What doesn't makes sense either, if we assume that the message gets visible when timeout is reached, is that the default timeout is 00:00:00 which implies that the message is visible at the same moment it is dequeued. This contradicts what 3. party sources is saying.

I am a bit confused by this.

like image 735
mslot Avatar asked Apr 19 '19 06:04

mslot


2 Answers

It appears there are actually two different visibility timeout values used here. Both are set by the Azure WebJobs SDK but only one is configurable.

When the function fails

The queues.visibilityTimeout configuration option would be more aptly named retryDelay.

When the function throws an exception or fails for some other kind of error, the message gets returned to the queue to be retried. The message is returned with the configured visibilityTimeout (see here), which delays when the function will next attempt to run.

This allows your application to cope with transient errors. For example, if an email API or other external service is temporarily down. By delaying the retry, there is a chance that the service may be back online for the next function attempt.

Retry is limited to maxDequeueCount attempts (5 default) before a message is moved to the Poison Queue.

While the function is running

When the QueueTrigger binding runs the function, it dequeues the message with a visibility timeout of 10mins (hard-coded here). It then sets a timer to extend the visibility window when it reaches half-time as long as the function is running (see the timer and visibility update in the source).

Ordinarily you don't need to worry about this as long as your functions use CancellationTokens correctly. This 10min timeout only matters if the Azure Function/WebJob host doesn't get to shut down gracefully. For example:

  • someone "pulls the plug" on the web host
  • if the function doesn't respond to the CancellationToken in time during scale-in or other Azure shutdown events

So, as long as the function is still running, the message will remain hidden from the queue.

Verification

I did a similar experiment to check:

[FunctionName("SlowJob")]
public async Task Run(
    [QueueTrigger("slow-job-queue")] CloudQueueMessage message,
    ILogger log)
{
    for (var i = 0; i < 20; i++)
    {
        log.LogInformation($"Next visible {i}: {message.NextVisibleTime}");
        await Task.Delay(60000);
    }
}

Output:

Next visible 0: 5/11/2020 7:49:24 +00:00
Next visible 1: 5/11/2020 7:49:24 +00:00
Next visible 2: 5/11/2020 7:49:24 +00:00
Next visible 3: 5/11/2020 7:49:24 +00:00
Next visible 4: 5/11/2020 7:49:24 +00:00
Next visible 5: 5/11/2020 7:54:24 +00:00
Next visible 6: 5/11/2020 7:54:24 +00:00
Next visible 7: 5/11/2020 7:54:24 +00:00
Next visible 8: 5/11/2020 7:54:24 +00:00
Next visible 9: 5/11/2020 7:54:24 +00:00
Next visible 10: 5/11/2020 7:59:24 +00:00
Next visible 11: 5/11/2020 7:59:24 +00:00
...
like image 50
Samuel Jaeschke Avatar answered Oct 14 '22 05:10

Samuel Jaeschke


I have tested this with

using System;
using System.Threading;
using System.Threading.Tasks;
using Microsoft.Azure.WebJobs;
using Microsoft.Azure.WebJobs.Host;
using Microsoft.Extensions.Logging;
using Microsoft.WindowsAzure.Storage.Queue;

namespace WorkerFunctions
{
    public static class WorkerFunctions
    {
        [FunctionName("WorkerFunction1")]
        public static async Task Function1(
            [QueueTrigger("outputQueue")]
            CloudQueueMessage item,
            [Queue("outputQueue")]
            CloudQueue outputQueue,
            DateTimeOffset nextVisibleTime,
            DateTimeOffset expirationTime,
            DateTimeOffset insertionTime,
            ILogger log)
        {
            log.LogInformation("########## Function 1 ###############");
            log.LogInformation($"NextVisibleTime: {nextVisibleTime}");
            log.LogInformation($"NextVisibleTime: {(nextVisibleTime-insertionTime).TotalSeconds}");
            log.LogInformation($"C# Queue trigger function processed: {item.AsString}");

            Thread.Sleep(TimeSpan.FromMinutes(20));
        }

        [FunctionName("WorkerFunction2")]
        public static async Task Function2(
            [QueueTrigger("outputQueue")]
            CloudQueueMessage item,
            [Queue("outputQueue")]
            CloudQueue outputQueue,
            DateTimeOffset nextVisibleTime,
            DateTimeOffset expirationTime,
            DateTimeOffset insertionTime,
            ILogger log)
        {
            log.LogInformation("########## Function 2 ###############");
            log.LogInformation($"NextVisibleTime: {nextVisibleTime}");
            log.LogInformation($"NextVisibleTime: {(nextVisibleTime - insertionTime).TotalSeconds}");
            log.LogInformation($"C# Queue trigger function processed: {item.AsString}");

            Thread.Sleep(TimeSpan.FromMinutes(20));
        }
    }
}

With this host file

{
  "version": "2.0",
  "extensions": {
    "queues": {
      "maxPollingInterval": "00:00:02",
      "visibilityTimeout": "00:00:10",
      "batchSize": 16,
      "maxDequeueCount": 5,
      "newBatchThreshold": 8
    }
  }
}

And when i put a simple message on the queue and let it run, I see the following:

  1. the function that grabs it, doesn't release it before the sleep is over
  2. i can't see it in logs that the lease is renewed, but it seems like it happens under the hood

What this tells me:

  1. if the function doesn't fail, OR the host doesn't fail, well then the lease is autorenewed according to: https://stackoverflow.com/a/31883806/21199
  2. when the visibility timeout is reached, and the function is running, the message doesn't get "readded" to the queue
  3. that the documentation about the visibilityTimeout is true: "The time interval between retries when processing of a message fails." (from https://docs.microsoft.com/en-us/azure/azure-functions/functions-bindings-storage-queue#hostjson-settings)

I haven't saved any links to 3. party that contradicted this (sorry I haven't saved these), but they exists. I wish someone will answer this, so I can get clarification.

like image 24
mslot Avatar answered Oct 14 '22 04:10

mslot