Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Strange behavior with uri and WebClient classes on SSIS package

I have a quite strange situation.

I have this very simple package:

enter image description here

  • Task "get list" retrieves a data table from an assembly with one column and a list of URL to be ran into a object variable.
  • The "foreach" loop loops through the object variable and loads the URL into a url string variable
  • The "run", calls the url with this code (its 2005 so Im stuck with VB):

    Dim myURI As New Uri("http://" + Dts.Variables("URL").Value.ToString())
    Dim myWebClient As New System.Net.WebClient
    myWebClient.OpenReadAsync(myURI)
    

the URL being called is internal and just reads the parameters and performs a series of operation which take some time, that's why I used "OpenReadAsync"

My problem is: if I have 4 URLs to run, the package runs only 2 of them. The loop lops 4 times, the script is called 4 times (I can see if I debug it), the line myWebClient.OpenReadAsync(myURI) is executed 4 times with 4 different values, but only 2 calls to the URL are made.

If I run the package again, the other 2 URLs are now called, which proofs that there isn't anything wrong with the URL and If I call the 4 urls manually on the browser (on 4 tabs for example) one right after another, them all produce the expected result, which proofs that there is nothing wrong with the code that parses the URL.

So I'm left with the VB code, its the first time Im using uri and WebClient so I wonder if Im doing something wrong. I also tried to add a 5 seconds sleep between the calls, but no luck.

Any help would be appreciated. Thanks

like image 743
Diego Avatar asked May 23 '12 15:05

Diego


2 Answers

All browsers are expected to limit themselves to 2 requests per host, to avoid overloading the host. .NET follows this rule and allows only 2 concurrent connections to a host. You can change this limit either by modifying an application's config file or through code.

  • To change the limit in the config file, change the maxConnection attribute in the system.net/connectionManagement element.
  • To change the limit through code, change the static ServicePointManager.DefaultConnectionLimit property.

The Delay you added to the script didn't work because you didn't call Dispose on the WebClient instance. The WebClient class keeps its connection open until you dispose of it in order to read the response stream. Otherwise you will not be able to connect to the same host again until the garbage collector collects the client.

Besides, OpenReadAsync opens the stream to the client and ensures it remains open unless you close it or it gets collected. You should use one of the DownloadXXXAsync to avoid opening the stream without a reason.

A better solution would be to call DownloadStringAsync and dispose of the client in the DownloadStringAsyncCompleted event.

UPDATE:

ServicePointManager.DefaultConnectionLimit is stored in a static field which means that its scope is the entire AppDomain. SSIS uses a single AppDomain for each package execution so the value will affect the entire package.

If you want to modify the connection limit only for a single host using FindServicePoint, you can create a ServicePoint for the host address and set the limit just for this address:

var myTarget= ServicePointManager.FindServicePoint(new Uri("http://www.google.com"));
myTarget.ConnectionLimit = 10;
like image 152
Panagiotis Kanavos Avatar answered Nov 13 '22 09:11

Panagiotis Kanavos


  1. Try to extend your timeout for every task and subtask.

  2. I wasn't asked, but I would hard-code a task like this instead of using SSIS. SSIS is perfect for ETL but not much else!

like image 29
Brandon Arnold Avatar answered Nov 13 '22 07:11

Brandon Arnold