I am working on an application which calls an external service and has to add all entries of the external collection into a local collection. The problem currently is that the external collection can exceed 1000 records, but the returned search results can only include up to twenty items.
For the sake of speed I figured using a collection of Tasks would be the way forward, so I came up with the code below:
int totalCount = returnedCol.total_count;
while (totalCount > myDict.Count)
{
int numberOfTasks = // logic to calculate how many tasks to run
List<Task> taskList = new List<Task>();
for (int i = 1; i <= numberOfTasks; i++)
{
Interlocked.Add(ref pageNumber, pageSize);
Task<SearchResponse> testTask = Task.Run(() =>
{
return ExternalCall.GetData(pageNumber, pageSize);
});
Thread.Sleep(100);
taskList.Add(testTask);
testTask.ContinueWith(o =>
{
foreach (ExternalDataRecord dataiwant in testTask.Result.dataiwant)
{
if (!myDict.ContainsKey(dataiwant.id))
myDict.GetOrAdd(dataiwant.id, dataiwant);
}
});
}
Task.WaitAll(taskList.ToArray());
}
However, this does not yield all results. The pageNumber
variable is incrementing correctly each time, but it seems that not all task results are being analysed (as the same logic on a single thread on a smaller data set returns all expected results). Also, I have tried declaring individual tasks in a chain (rather than a loop) and the test data is all returned. It seems that the higher the value I pass into Thread.Sleep()
the more the results are added into the local collection (but this isn't ideal, as it means the process takes longer!)
Currently in a sample of 600 records I'm only getting about 150-200 added to the myDict
collection. Am I missing something obvious?
WhenAll(Task[]) Creates a task that will complete when all of the Task objects in an array have completed.
WhenAll() method in . NET Core. This will upload the first file, then the next file. There is no parallelism here, as the “async Task” does not automatically make something run in in parallel.
The general answer is "Measure, Measure, Measure" :) if you're not experiencing any problems with performance, you shouldn't start optimizing. I'd say 200 tasks are fine though.
WhenAll creates a task that will complete when all of the supplied tasks have been completed. It's pretty straightforward what this method does, it simply receives a list of Tasks and returns a Task when all of the received Tasks completes.
I think if you take a more functional and less imperative approach to your code, you'll be a lot less likely to run into hard-to-understand issues. I think something like this would have the same effect you're going for:
int totalCount = returnedCol.total_count;
var tasks = Enumerable.Range(1, totalCount / pageSize)
.Select(async page => {
await Task.Delay(page * 100);
return ExternalCall.GetData(page, pageSize));
})
.ToArray();
myDict = (await Task.WhenAll(tasks))
.ToDictionary(dataiwant => dataiwant.id);
The above code assumes you still want to wait 100ms between requests for throttling purposes. If you just had that Thread.Sleep()
there to try fixing issues you were having, you could further simplify it:
int totalCount = returnedCol.total_count;
var tasks = Enumerable.Range(1, totalCount / pageSize)
.Select(async page => await Task.Run(() => ExternalCall.GetData(page, pageSize)))
.ToArray();
myDict = (await Task.WhenAll(tasks))
.ToDictionary(dataiwant => dataiwant.id);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With