Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cosmos DB 408 response in Azure Function

I have an Azure Function (v2) that accesses Cosmos DB, but not through a binding (we need to use custom serialization settings). I've followed the example here for setting up an object that should then be available to all instances of the activity function. Mine is a little different because our custom CosmosDb object requires an await for setup.

public static class AnalyzeActivityTrigger
{
    private static readonly Lazy<Task<CosmosDb>> LazyCosmosDb = new Lazy<Task<CosmosDb>>(InitializeDocumentClient);
    private static Task<CosmosDb> CosmosDb => LazyCosmosDb.Value;

    private static Task<CosmosDb> InitializeDocumentClient()
    {
        return StorageFramework.CosmosDb.GetCosmosDb(DesignUtilities.Storage.CosmosDbContainerDefinitions, DesignUtilities.Storage.CosmosDbMigrations);
    }

    [FunctionName(nameof(AnalyzeActivityTrigger))]
    public static async Task<Guid> Run(
        [ActivityTrigger]DurableActivityContext context,
        ILogger log)
    {
        var analyzeActivityRequestString = context.GetInput<string>();
        var analyzeActivityRequest = StorageFramework.Storage.Deserialize<AnalyzeActivityRequest>(analyzeActivityRequestString);
        var componentDesign = StorageFramework.Storage.Deserialize<ComponentDesign>(analyzeActivityRequest.ComponentDesignString);

        var (analysisSet, _, _) = await AnalysisUtilities.AnalyzeComponentDesignAndUploadArtifacts(componentDesign,
            LogVariables.Off, new AnalysisLog(), Stopwatch.StartNew(), analyzeActivityRequest.CommitName, await CosmosDb);

        return analysisSet.AnalysisReport.Guid;
    }
}

We fan out, calling this activity function in parallel. Our documents are fairly large, so updating them is expensive, and that happens as part of this code.

I sometimes get this error when container.ReplaceItemAsync is called:

Response status code does not indicate success: 408 Substatus: 0 Reason: (Message: Request timed out. ...

The obvious thing to do seems to be to increase the timeout, but could this be indicative of some other problem? Increasing the timeout seems like addressing the symptom rather than the problem. We have code that scales up our RUs before all this happens, too. I'm wondering if it has to do with Azure Functions fanning out and that putting too much load on it. So I've also played around with adjusting the host.json settings for durableTask like maxConcurrentActivityFunctions and maxConcurrentOrchestratorFunctions, but to no avail so far.

How should I approach this 408 error? What steps can I consider to mitigate it other than increasing the request timeout?

Update 1: I increased the default request timeout to 5 minutes and now I'm getting 503 responses.

Update 2: Pointing to a clone published to an Azure Function on the Premium plan seems to work after multiple tests.

Update 3: We weren't testing it hard enough. The problem is exhibited on the Premium plan as well. GitHub Issue forthcoming.

Update 4: We seem to have solved this by a combination of using Gateway mode in connecting to Cosmos and increasing RUs.

like image 278
Scotty H Avatar asked Nov 08 '19 00:11

Scotty H


People also ask

How do I use Cosmos DB API with Azure Functions?

If we want to use another Cosmos DB API in our Azure Functions, we’ll have to create a static client or as we’ll do next, create a Singleton instance of the client for the API that we’re using. By default, the Cosmos DB bindings use version 2 of the .NET SDK.

What is http 408 error in Azure Cosmos?

The HTTP 408 error occurs if the SDK was unable to complete the request before the timeout limit occurred. Customize the timeout on the Azure Cosmos DB.NET SDK The SDK has two distinct alternatives to control timeouts, each with a different scope.

Why is my Azure Cosmos DB request timeout high?

Users sometimes see elevated latency or request timeouts because their collections are provisioned insufficiently, the back-end throttles requests, and the client retries internally. Check the portal metrics. Azure Cosmos DB distributes the overall provisioned throughput evenly across physical partitions.

Can I write JSON documents to an Azure Cosmos DB container?

You've updated your HTTP triggered function to write JSON documents to an Azure Cosmos DB container. Now you can learn more about developing Functions using Visual Studio Code: Azure Functions triggers and bindings.


1 Answers

A timeout can indeed signal issues regarding instance resources. Reference: https://learn.microsoft.com/azure/cosmos-db/troubleshoot-dot-net-sdk#request-timeouts

If you are running on Functions, take a look at the Connections. Also verify CPU usage in the instances. If CPU is high, it can affect requests latency and end up getting timeouts.

For Functions, you can certainly use DI to avoid the whole Lazy declaration: https://github.com/Azure/azure-cosmos-dotnet-v3/tree/master/Microsoft.Azure.Cosmos.Samples/Usage/AzureFunctions

Create a Startup.cs file with:

using System;
using Microsoft.Azure.Cosmos;
using Microsoft.Azure.Functions.Extensions.DependencyInjection;
using Microsoft.Extensions.Configuration;
using Microsoft.Extensions.DependencyInjection;

[assembly: FunctionsStartup(typeof(YourNameSpace.Startup))]

namespace YourNameSpace
{
    public class Startup : FunctionsStartup
    {
        public override void Configure(IFunctionsHostBuilder builder)
        {
            builder.Services.AddSingleton((s) => {
                CosmosClient cosmosClient = new CosmosClient("connection string");

                return cosmosClient;
            });
        }
    }
}

And then you can make your Functions not static and inject it:

public class AnalyzeActivityTrigger
{
    private readonly CosmosClient cosmosClient;
    public AnalyzeActivityTrigger(CosmosClient cosmosClient)
    {
        this.cosmosClient = cosmosClient;
    }

    [FunctionName(nameof(AnalyzeActivityTrigger))]
    public async Task<Guid> Run(
        [ActivityTrigger]DurableActivityContext context,
        ILogger log)
    {
        var analyzeActivityRequestString = context.GetInput<string>();
        var analyzeActivityRequest = StorageFramework.Storage.Deserialize<AnalyzeActivityRequest>(analyzeActivityRequestString);
        var componentDesign = StorageFramework.Storage.Deserialize<ComponentDesign>(analyzeActivityRequest.ComponentDesignString);

        var (analysisSet, _, _) = await AnalysisUtilities.AnalyzeComponentDesignAndUploadArtifacts(componentDesign,
            LogVariables.Off, new AnalysisLog(), Stopwatch.StartNew(), analyzeActivityRequest.CommitName, this.cosmosClient);

        return analysisSet.AnalysisReport.Guid;
    }
}
like image 197
Matias Quaranta Avatar answered Oct 19 '22 16:10

Matias Quaranta