Logo Questions Linux Laravel Mysql Ubuntu Git Menu

Async.Parallel or Array.Parallel.Map?



I'm trying to implement a pattern I read from Don Syme's blog


which suggests that there are opportunities for massive performance improvements from leveraging asynchronous I/O. I am currently trying to take a piece of code that "works" one way, using Array.Parallel.Map, and see if I can somehow achieve the same result using Async.Parallel, but I really don't understand Async.Parallel, and cannot get anything to work.

I have a piece of code (simplified below to illustrate the point) that successfully retrieves an array of data for one cusip. (A price series, for example)

let getStockData cusip = 
    let D = DataProvider()
    let arr = D.GetPriceSeries(cusip)
    return arr

let data = Array.Parallel.map (fun x -> getStockData x) stockCusips

So this approach contructs an array of arrays, by making a connection over the internet to my data vendor for each stock (which could be as many as 3000) and returns me an array of arrays (1 per stock, with a price series for each one). I admittedly don't understand what goes on underneath Array.Parallel.map, but am wondering if this is a scenario where there are resources wasted under the hood, and it actually could be faster using asynchronous I/O? So to test this out, I have attempted to make this function using asyncs, and I think that the function below follows the pattern in Don Syme's article using the URLs, but it won't compile with "let!".

let getStockDataAsync cusip = 
    async {  let D = DataProvider()
             let! arr = D.GetData(cusip)
             return arr

The error I get is: This expression was expected to have type Async<'a> but here has type obj

It compiles fine with "let" instead of "let!", but I had thought the whole point was that you need the exclamation point in order for the command to run without blocking a thread.

So the first question really is, what's wrong with my syntax above, in getStockDataAsync, and then at a higher level, can anyone offer some additional insight about asychronous I/O and whether the scenario I have presented would benefit from it, making it potentially much, much faster than Array.Parallel.map? Thanks so much.

like image 201
user297400 Avatar asked Mar 19 '10 14:03


2 Answers

F# asynchronous workflows allow you to implement asynchronous computations, however, F# makes a distinction between usual computation and asynchronous computations. This difference is tracked by the type-system. For example a method that downloads web page and is synchronous has a type string -> string (taking URL and returning HTML), but a method that does the same thing asynchronously has a type string -> Async<string>. In the async block, you can use let! to call asynchronous operations, but all other (standard synchronous) methods have to be called using let. Now, the problem with your example is that the GetData operation is ordinary synchronous method, so you cannot invoke it with let!.

In the typical F# scenario, if you want to make the GetData member asynchronous, you'll need to implement it using an asynchronous workflow, so you'll also need to wrap it in the async block. At some point, you will reach a location where you really need to run some primitive operation asynchronously (for example, downloading data from a web site). F# provides several primitive asynchronous operations that you can call from async block using let! such as AsyncGetResponse (which is an asynchronous version of GetResponse method). So, in your GetData method, you'll for example write something like this:

let GetData (url:string) = async {
  let req = WebRequest.Create(url)
  let! rsp = req.AsyncGetResponse()
  use stream = rsp.GetResponseStream()
  use reader = new System.IO.StreamReader(stream)
  let html = reader.AsyncReadToEnd() 
  return CalculateResult(html) }

The summary is that you need to identify some primitive asynchronous operations (such as waiting for the web server or for the file system), use primitive asynchronous operations at that point and wrap all the code that uses these operations in async blocks. If there are no primitive operations that could be run asynchronously, then your code is CPU-bound and you can just use Parallel.map.

I hope this helps you understand how F# asynchronous workflows work. For more information, you can for example take a look at Don Syme's blog post, series about asynchronous programming by Robert Pickering, or my F# web cast.

like image 149
Tomas Petricek Avatar answered Nov 15 '22 16:11

Tomas Petricek

@Tomas already has a great answer. I'll just say a couple bits in addition.

The idiom for F# asyncs is to name the method with an "Async" prefix (AsyncFoo, not FooAsync; the latter is an idiom already used by another .NET technology). So your functions should be getStockData and asyncGetStockData.

Inside an async workflow, whenever you use let! instead of let or do! instead of do, the thing on the right should have type Async<T> instead of T. Basically you need an existing async computation in order to 'go async' at this point in the workflow. Each Async<T> will itself be either some other async{...} workflow, or else an async "primitive". The primitives are defined in the F# library or created in user code via Async.FromBeginEnd or Async.FromContinuations which enable defining the low-level details of starting a computation, registering an I/O callback, releasing the thread, and then restarting the computation when getting called back. So you have to 'plumb' async all the way down to some truly-async-I/O-primitive in order to get the full benefits of async I/O.

like image 36
Brian Avatar answered Nov 15 '22 16:11
