Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Appropriate solution for long running computations in Azure App Service and .NET Core 3.1?

What is an appropriate solution for long running computations in Azure App Service and .NET Core 3.1 in an application that has no need for a database and no IO to anything outside of this application ? It is a computation task.

Specifically, the following is unreliable and needs a solution.

[Route("service")]
[HttpPost]
public Outbound Post(Inbound inbound)
{
    Debug.Assert(inbound.Message.Equals("Hello server."));
    Outbound outbound = new Outbound();
    long Billion = 1000000000;
    for (long i = 0; i < 33 * Billion; i++) // 230 seconds
        ;
    outbound.Message = String.Format("The server processed inbound object.");
    return outbound;
}

This sometimes returns a null object to the HttpClient (not shown). A smaller workload will always succeed. For example 3 billion iterations always succeeds. A bigger number would be nice specifically 240 billion is a requirement.

I think in the year 2020 a reasonable goal in Azure App Service with .NET Core might be to have a parent thread count to 240 billion with the help of 8 child threads so each child counts to 30 billion and the parent divides an 8 M byte inbound object into smaller objects inbound to each child. Each child receives a 1 M byte inbound and returns to the parent a 1 M byte outbound. The parent re-assembles the result into a 8 M byte outbound.

Obviously the elapsed time will be 12.5%, or 1/8, or one-eighth, of the time a single thread implementation would need. The time to cut-up and re-assemble objects is small compared to the computation time. I am assuming the time to transmit the objects is very small compared to the computation time so the 12.5% expectation is roughly accurate.

If I can get 4 or 8 cores that would be good. If I can get threads that give me say 50% of the cycles of a core, then I would need may be 8 or 16 threads. If each thread gives me 33% of the cycles of a core then I would need 12 or 24 threads.

I am considering the BackgroundService class but I am looking for confirmation that this is the correct approach. Microsoft says...

BackgroundService is a base class for implementing a long running IHostedService.

Obviously if something is long running it would be better to make it finish sooner by using multiple cores via System.Threading but this documentation seems to mention System.Threading only in the context of starting tasks via System.Threading.Timer. My example code shows there is no timer needed in my application. An HTTP POST will serve as the occasion to do work. Typically I would use System.Threading.Thread to instantiate multiple objects to use multiple cores. I find the absence of any mention of multiple cores to be a glaring omission in the context of a solution for work that takes a long time but may be there is some reason Azure App Service doesn't deal with this matter. Perhaps I am just not able to find it in tutorials and documentation.

The initiation of the task is the illustrated HTTP POST controller. Suppose the longest job takes 10 minutes. The HTTP client (not shown) sets the timeout limit to 1000 seconds which is much more than 10 minutes (600 seconds) in order for there to be a margin of safety. HttpClient.Timeout is the relevant property. For the moment I am presuming the HTTP timeout is a real limit; rather than some sort of non-binding (fake limit) such that some other constraint results in the user waiting 9 minutes and receiving an error message. A real binding limit is a limit for which I can say "but for this timeout it would have succeeded". If the HTTP timeout is not the real binding limit and there is something else constraining the system, I can adjust my HTTP controller to instead have three (3) POST methods. Thus POST1 would mean start a task with the inbound object. POST2 means tell me if it is finished. POST3 means give me the outbound object.

What is an appropriate solution for long running computations in Azure App Service and .NET Core 3.1 in an application that has no need for a database and no IO to anything outside of this application ? It is a computation task.

like image 647
H2ONaCl Avatar asked Aug 12 '20 03:08

H2ONaCl


People also ask

What is runtime stack in Azure?

Runtime Stack: defines the technology stack used to develop the app (i.e., '. NET Core 3.0 (Current)'); Operating System: sets the app hosting platform (i.e., Linux).

Does Azure App Service use Kestrel?

Yes, when you publish to Azure Web Services, IIS is used to host your application. As you said, it acts as a reverse proxy to your application, which is running Kestrel HTTP server.


2 Answers

Prologue

A few years ago a ran in to a pretty similar problem. We needed a service that could process large amounts of data. Sometimes the processing would take 10 seconds, other times it could take an hour.

At first we did it how your question illustrates: Send a request to the service, the service processes the data from the request and returns the response when finished.

Issues At Hand

This was fine when the job only took around a minute or less, but anything above this, the server would shut down the session and the caller would report an error.

Servers have a default of around 2 minutes to produce a response before it gives up on the request. It doesn't quit the processing of the request... but it does quit the HTTP session. It doesn't matter what parameters you set on your HttpClient, the server is the one that delegates how long is too long.

Reasons For Issues

All this is for good reasons. Server sockets are extremely expensive. You have a finite amount to go around. The server is trying to protect your service by severing requests that are taking longer than a specified time in order to avoid socket starvation issues.

Typically you want your HTTP requests to take only a few milliseconds. If they are taking longer than this, you will eventually run in to socket issues if your service has to fulfil other requests at a high rate.

Solution

We decided to go the route of IHostedService, specifically the BackgroundService. We use this service in conjunction with a Queue. This way you can set up a queue of jobs and the BackgroundService will process them one at a time (in some instances we have service processing multiple queue items at once, in others we scaled horizontally producing two or more queues).

Why an ASP.NET Core service running a BackgroundService? I wanted to handle this without tightly-coupling to any Azure-specific constructs in case we needed to move out of Azure to some other cloud service (back in the day we were contemplating this for other reasons we had at the time.)

This has worked out quite well for us and we haven't seen any issues since.

The work flow goes like this:

  1. Caller sends a request to the service with some parameters
  2. Service generates a "job" object and returns an ID immediately via 202 (accepted) response
  3. Service places this job in to a queue that is being maintained by a BackgroundService
  4. Caller can query the job status and get information about how much has been done and how much is left to go using this job ID
  5. Service finishes the job, puts the job in to a "completed" state and goes back to waiting on the queue to produce more jobs

Keep in mind your service has the capability to scale horizontally where there would be more than one instance running. In this case I am using Redis Cache to store the state of the jobs so that all instances share the same state.

I also added in a "Memory Cache" option to test things locally if you don't have a Redis Cache available. You could run the "Memory Cache" service on a server, just know that if it scales then your data will be inconsistent.

Example

Since I'm married with kids, I really don't do much on Friday nights after everyone goes to bed, so I spent some time putting together an example that you can try out. The full solution is also available for you to try out.

QueuedBackgroundService.cs

This class implementation serves two specific purposes: One is to read from the queue (the BackgroundService implementation), the other is to write to the queue (the IQueuedBackgroundService implementation).

    public interface IQueuedBackgroundService
    {
        Task<JobCreatedModel> PostWorkItemAsync(JobParametersModel jobParameters);
    }

    public sealed class QueuedBackgroundService : BackgroundService, IQueuedBackgroundService
    {
        private sealed class JobQueueItem
        {
            public string JobId { get; set; }
            public JobParametersModel JobParameters { get; set; }
        }

        private readonly IComputationWorkService _workService;
        private readonly IComputationJobStatusService _jobStatusService;

        // Shared between BackgroundService and IQueuedBackgroundService.
        // The queueing mechanism could be moved out to a singleton service. I am doing
        // it this way for simplicity's sake.
        private static readonly ConcurrentQueue<JobQueueItem> _queue =
            new ConcurrentQueue<JobQueueItem>();
        private static readonly SemaphoreSlim _signal = new SemaphoreSlim(0);

        public QueuedBackgroundService(IComputationWorkService workService,
            IComputationJobStatusService jobStatusService)
        {
            _workService = workService;
            _jobStatusService = jobStatusService;
        }

        /// <summary>
        /// Transient method via IQueuedBackgroundService
        /// </summary>
        public async Task<JobCreatedModel> PostWorkItemAsync(JobParametersModel jobParameters)
        {
            var jobId = await _jobStatusService.CreateJobAsync(jobParameters).ConfigureAwait(false);
            _queue.Enqueue(new JobQueueItem { JobId = jobId, JobParameters = jobParameters });
            _signal.Release(); // signal for background service to start working on the job
            return new JobCreatedModel { JobId = jobId, QueuePosition = _queue.Count };
        }

        /// <summary>
        /// Long running task via BackgroundService
        /// </summary>
        protected override async Task ExecuteAsync(CancellationToken stoppingToken)
        {
            while(!stoppingToken.IsCancellationRequested)
            {
                JobQueueItem jobQueueItem = null;
                try
                {
                    // wait for the queue to signal there is something that needs to be done
                    await _signal.WaitAsync(stoppingToken).ConfigureAwait(false);

                    // dequeue the item
                    jobQueueItem = _queue.TryDequeue(out var workItem) ? workItem : null;

                    if(jobQueueItem != null)
                    {
                        // put the job in to a "processing" state
                        await _jobStatusService.UpdateJobStatusAsync(
                            jobQueueItem.JobId, JobStatus.Processing).ConfigureAwait(false);

                        // the heavy lifting is done here...
                        var result = await _workService.DoWorkAsync(
                            jobQueueItem.JobId, jobQueueItem.JobParameters,
                            stoppingToken).ConfigureAwait(false);

                        // store the result of the work and set the status to "finished"
                        await _jobStatusService.StoreJobResultAsync(
                            jobQueueItem.JobId, result, JobStatus.Success).ConfigureAwait(false);
                    }
                }
                catch(TaskCanceledException)
                {
                    break;
                }
                catch(Exception ex)
                {
                    try
                    {
                        // something went wrong. Put the job in to an errored state and continue on
                        await _jobStatusService.StoreJobResultAsync(jobQueueItem.JobId, new JobResultModel
                        {
                            Exception = new JobExceptionModel(ex)
                        }, JobStatus.Errored).ConfigureAwait(false);
                    }
                    catch(Exception)
                    {
                        // TODO: log this
                    }
                }
            }
        }
    }

It is injected as so:

    services.AddHostedService<QueuedBackgroundService>();
    services.AddTransient<IQueuedBackgroundService, QueuedBackgroundService>();

ComputationController.cs

The controller used to read/write jobs looks like this:

    [ApiController, Route("api/[controller]")]
    public class ComputationController : ControllerBase
    {
        private readonly IQueuedBackgroundService _queuedBackgroundService;
        private readonly IComputationJobStatusService _computationJobStatusService;

        public ComputationController(
            IQueuedBackgroundService queuedBackgroundService,
            IComputationJobStatusService computationJobStatusService)
        {
            _queuedBackgroundService = queuedBackgroundService;
            _computationJobStatusService = computationJobStatusService;
        }

        [HttpPost, Route("beginComputation")]
        [ProducesResponseType(StatusCodes.Status202Accepted, Type = typeof(JobCreatedModel))]
        public async Task<IActionResult> BeginComputation([FromBody] JobParametersModel obj)
        {
            return Accepted(
                await _queuedBackgroundService.PostWorkItemAsync(obj).ConfigureAwait(false));
        }

        [HttpGet, Route("computationStatus/{jobId}")]
        [ProducesResponseType(StatusCodes.Status200OK, Type = typeof(JobModel))]
        [ProducesResponseType(StatusCodes.Status404NotFound, Type = typeof(string))]
        public async Task<IActionResult> GetComputationResultAsync(string jobId)
        {
            var job = await _computationJobStatusService.GetJobAsync(jobId).ConfigureAwait(false);
            if(job != null)
            {
                return Ok(job);
            }
            return NotFound($"Job with ID `{jobId}` not found");
        }

        [HttpGet, Route("getAllJobs")]
        [ProducesResponseType(StatusCodes.Status200OK,
            Type = typeof(IReadOnlyDictionary<string, JobModel>))]
        public async Task<IActionResult> GetAllJobsAsync()
        {
            return Ok(await _computationJobStatusService.GetAllJobsAsync().ConfigureAwait(false));
        }

        [HttpDelete, Route("clearAllJobs")]
        [ProducesResponseType(StatusCodes.Status200OK)]
        [ProducesResponseType(StatusCodes.Status401Unauthorized)]
        public async Task<IActionResult> ClearAllJobsAsync([FromQuery] string permission)
        {
            if(permission == "this is flakey security so this can be run as a public demo")
            {
                await _computationJobStatusService.ClearAllJobsAsync().ConfigureAwait(false);
                return Ok();
            }
            return Unauthorized();
        }
    }

Working Example

For as long as this question is active, I will maintain a working example you can try out. For this specific example, you can specify how many iterations you would like to run. To simulate long-running work, each iteration is 1 second. So, if you set the iteration value to 60, it will run that job for 60 seconds.

While it's running, run the computationStatus/{jobId} or getAllJobs endpoint. You can watch all the jobs update in real time.

This example is far from a fully-functioning-covering-all-edge-cases-full-blown-ready-for-production example, but it's a good start.

Conclusion

After a few years of working in the back-end, I have seen a lot of issues arise by not knowing all the "rules" of the back-end. Hopefully this answer will shed some light on issues I had in the past and hopefully this saves you from having to deal with said problems.

like image 58
Andy Avatar answered Sep 18 '22 00:09

Andy


One option could be to try out Azure Durable Functions, which are more oriented to long-running jobs that warrant checkpoints and state as against attempting to finish within the context of the triggering request. It also has the concept of fan-out/fan-in, in case what you're describing could be divided into smaller jobs with an aggregated result.

If just raw compute is the goal, Azure Batch might be a better option since it facilitates that scaling.

like image 38
Noah Stahl Avatar answered Sep 22 '22 00:09

Noah Stahl