Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Concurrently downloading JSON data from remote service(s)

I am pulling JSON data from several remote servers concurrently over HTTP, using a WCF service on both the client and server endpoints. I'm noticing that for each successive request that starts asynchronously the length of time that http request takes is generally increasing, even if the amount of data is not necessarily increasing. In other words if I start 12 thread pool threads (using Func<>.BeginInvoke) then each request, after being timed, is showing up in my logs as such:

  :HttpRequest invoked. Elapsed: 325ms
  :HttpRequest invoked. Elapsed: 27437ms
  :HttpRequest invoked. Elapsed: 28642ms
  :HttpRequest invoked. Elapsed: 28496ms
  :HttpRequest invoked. Elapsed: 32544ms
  :HttpRequest invoked. Elapsed: 38073ms
  :HttpRequest invoked. Elapsed: 41231ms
  :HttpRequest invoked. Elapsed: 47914ms
  :HttpRequest invoked. Elapsed: 45570ms
  :HttpRequest invoked. Elapsed: 61602ms
  :HttpRequest invoked. Elapsed: 53567ms
  :HttpRequest invoked. Elapsed: 79081ms

The process is pretty simple. I am simply starting each request in a loop and then calling .WaitAll() on all of the operations before using the consolidated data.

It looks like the Http requests are taking way longer than they should even with small amounts of data. In fact the difference between small and large amounts of data appears minimal overall. Would this sort of bottleneck be due to concurrent http requests having to share bandwidth, or is there a threading / context-switching issue possible here? Just looking to be pointed in the right direction.

EDIT -- Just for clarity, I ran the same process synchronously and here are the results:

  :HttpRequest invoked. Elapsed: 20627ms
  :HttpRequest invoked. Elapsed: 16288ms
  :HttpRequest invoked. Elapsed: 2273ms
  :HttpRequest invoked. Elapsed: 4578ms
  :HttpRequest invoked. Elapsed: 1920ms
  :HttpRequest invoked. Elapsed: 564ms
  :HttpRequest invoked. Elapsed: 1210ms
  :HttpRequest invoked. Elapsed: 274ms
  :HttpRequest invoked. Elapsed: 145ms
  :HttpRequest invoked. Elapsed: 21447ms
  :HttpRequest invoked. Elapsed: 27001ms
  :HttpRequest invoked. Elapsed: 1957ms

The total time (because its synchronous) went up, however you can see clearly that each individual request is generally faster. Unfortunately I dont know of any way to isolate the problem -- but my guess is that its a bandwidth sharing issue between the threads.

So I some more straightforward question I have is:

1) If I use a non-threadpool thread, would this improve

2) Should I group the operations into only a few threads, rather than each request having its own?

3) Is this a standard problem when trying to concurrently download data over Http?

like image 512
Sean Thoman Avatar asked Jul 27 '11 22:07

Sean Thoman


People also ask

What is a JSON download?

A JSON file is a file that stores simple data structures and objects in JavaScript Object Notation (JSON) format, which is a standard data interchange format. It is primarily used for transmitting data between a web application and a server.

Is JSON a good way to store data?

JSON is perfect for storing temporary data. For example, temporary data can be user-generated data, such as a submitted form on a website. JSON can also be used as a data format for any programming language to provide a high level of interoperability.

What is the difference between JSON and Ndjson?

Unlike normal JSON files, adding a new log entry to this NDJSON file does not require modification of this file's structure (note there's no "outer array" to be modified). This makes it a perfect fit for a streaming context or a logging context where you want to append records at a later time.


2 Answers

As per this question there is a setting which controls how many simultaneous HTTP requests can be made. Also, you should be using the BeginGetResponse method on HttpWebRequest for concurrent downloading because it is less expensive than creating threads. Look here for examples.

like image 108
eulerfx Avatar answered Oct 03 '22 12:10

eulerfx


Might be related to the concurrency mode of your service. Check http://msdn.microsoft.com/en-us/library/system.servicemodel.concurrencymode.aspx and make sure that the service is not single threaded.

like image 42
bkdc Avatar answered Oct 03 '22 10:10

bkdc