Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best practices to parallelize using async workflow

Lets say I wanted to scrape a webpage, and extract some data. I'd most likely write something like this:

let getAllHyperlinks(url:string) =
    async {  let req = WebRequest.Create(url)
             let! rsp = req.GetResponseAsync()
             use stream = rsp.GetResponseStream()             // depends on rsp
             use reader = new System.IO.StreamReader(stream)  // depends on stream
             let! data = reader.AsyncReadToEnd()              // depends on reader
             return extractAllUrls(data) }                    // depends on data

The let! tells F# to execute the code in another thread, then bind the result to a variable, and continue processing. The sample above uses two let statements: one to get the response, and one to read all the data, so it spawns at least two threads (please correct me if I'm wrong).

Although the workflow above spawns several threads, the order of execution is serial because each item in the workflow depends on the previous item. Its not really possible to evaluate any items further down the workflow until the other threads return.

Is there any benefit to having more than one let! in the code above?

If not, how would this code need to change to take advantage of multiple let! statements?

like image 875
Juliet Avatar asked Jan 30 '09 17:01

Juliet


2 Answers

The key is we are not spawning any new threads. During the whole course of the workflow, there are 1 or 0 active threads being consumed from the ThreadPool. (An exception, up until the first '!', the code runs on the user thread that did an Async.Run.) "let!" lets go of a thread while the Async operation is at sea, and then picks up a thread from the ThreadPool when the operation returns. The (performance) advantage is less pressure against the ThreadPool (and of course the major user advantage is the simple programming model - a million times better than all that BeginFoo/EndFoo/callback stuff you otherwise write).

See also http://cs.hubfs.net/forums/thread/8262.aspx

like image 137
Brian Avatar answered Sep 29 '22 12:09

Brian


I was writing an answer but Brian beat me to it. I fully agree with him.

I'd like to add that if you want to parallelize synchronous code, the right tool is PLINQ, not async workflows, as Don Syme explains.

like image 21
Mauricio Scheffer Avatar answered Sep 29 '22 12:09

Mauricio Scheffer