Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use multiple tasks to retrieve all records from a large collection

I am working on an application which calls an external service and has to add all entries of the external collection into a local collection. The problem currently is that the external collection can exceed 1000 records, but the returned search results can only include up to twenty items.

For the sake of speed I figured using a collection of Tasks would be the way forward, so I came up with the code below:

int totalCount = returnedCol.total_count;
        while (totalCount > myDict.Count)
        {
            int numberOfTasks = // logic to calculate how many tasks to run

            List<Task> taskList = new List<Task>();

            for (int i = 1; i <= numberOfTasks; i++)
            {
                Interlocked.Add(ref pageNumber, pageSize);

                Task<SearchResponse> testTask = Task.Run(() =>
                {
                    return ExternalCall.GetData(pageNumber, pageSize);
                });

                Thread.Sleep(100);

                taskList.Add(testTask);
                testTask.ContinueWith(o =>
                {
                    foreach (ExternalDataRecord dataiwant in testTask.Result.dataiwant)
                    {
                        if (!myDict.ContainsKey(dataiwant.id))
                            myDict.GetOrAdd(dataiwant.id, dataiwant);
                    }
                });
            }
            Task.WaitAll(taskList.ToArray());
        }

However, this does not yield all results. The pageNumber variable is incrementing correctly each time, but it seems that not all task results are being analysed (as the same logic on a single thread on a smaller data set returns all expected results). Also, I have tried declaring individual tasks in a chain (rather than a loop) and the test data is all returned. It seems that the higher the value I pass into Thread.Sleep() the more the results are added into the local collection (but this isn't ideal, as it means the process takes longer!)

Currently in a sample of 600 records I'm only getting about 150-200 added to the myDict collection. Am I missing something obvious?

like image 646
Chris Wright Avatar asked May 11 '16 15:05

Chris Wright


People also ask

What is the use of task WhenAll?

WhenAll(Task[]) Creates a task that will complete when all of the Task objects in an array have completed.

Does task WhenAll run in parallel?

WhenAll() method in . NET Core. This will upload the first file, then the next file. There is no parallelism here, as the “async Task” does not automatically make something run in in parallel.

How many tasks can be created C#?

The general answer is "Measure, Measure, Measure" :) if you're not experiencing any problems with performance, you shouldn't start optimizing. I'd say 200 tasks are fine though.

Does task WhenAll start the tasks?

WhenAll creates a task that will complete when all of the supplied tasks have been completed. It's pretty straightforward what this method does, it simply receives a list of Tasks and returns a Task when all of the received Tasks completes.


1 Answers

I think if you take a more functional and less imperative approach to your code, you'll be a lot less likely to run into hard-to-understand issues. I think something like this would have the same effect you're going for:

int totalCount = returnedCol.total_count;
var tasks = Enumerable.Range(1, totalCount / pageSize)
    .Select(async page => {
        await Task.Delay(page * 100);
        return ExternalCall.GetData(page, pageSize));
    })
    .ToArray();
myDict = (await Task.WhenAll(tasks))
    .ToDictionary(dataiwant => dataiwant.id);

The above code assumes you still want to wait 100ms between requests for throttling purposes. If you just had that Thread.Sleep() there to try fixing issues you were having, you could further simplify it:

int totalCount = returnedCol.total_count;
var tasks = Enumerable.Range(1, totalCount / pageSize)
    .Select(async page => await Task.Run(() => ExternalCall.GetData(page, pageSize)))
    .ToArray();
myDict = (await Task.WhenAll(tasks))
    .ToDictionary(dataiwant => dataiwant.id);
like image 196
StriplingWarrior Avatar answered Nov 14 '22 20:11

StriplingWarrior