Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

WebWorkers and Asynchronous shared data access. How in Scala.js?

Please consider a Scala.js class that contains a large JavaScript typed array called xArr.

A process called p(xArr) consumes xArr as input but takes a long time to complete. In order to avoid script timeout warnings, p(xArr) runs in a Web Worker.

Recall these constraints on communication between the main thread and the Web Worker thread:

  1. Communication in either direction takes the form of message passing.
  2. Message data must conform to the requirements of JavaScript's structured clone algorithm.
  3. Unless specified in the optional transfer list, message data gets duplicated instead of transfered to/from the main and Worker threads.
  4. To transfer message data instead of copying it to/from the worker thread, the data must implement the Transferable interface and the transfer list must contain a reference to the transferable data.
  5. If a transferable object transfers between threads, the sending thread loses access to it.

Because of xArr's size, sending a copy of it to the worker thread will incur severe memory costs, but becasue of p(xArr)'s run time, it can not run in the main thread.

Fortunately, typed arrays implement the Transferable interface, so to save compute and memory resources, the program invokes p(xArr) by transferring xArr to the WebWorker which invokes p(xArr) then transfers xArr back to the main thread.

Unfortunately, other asynchronous methods in the main thread must access xArr which may have transferred to the worker's scope at invocation time.

What Scala language features could govern access to xArr so that method calls execute immediately when the main thread owns xArr but wait for it to return to scope when the worker owns xArr?

In other words: How would you handle a class variable that continuously alternates between defined and undefined over time?

Would you suggest locks? Promise/Callback queues? Would you approach the problem in an entirely different way? If so, how?

Remember that this is a Scala.js library, so we must disqualify JVM specific features.

like image 787
Ben McKenneby Avatar asked Apr 21 '18 02:04

Ben McKenneby


2 Answers

I understand your very real pain here. This used to work with SharedArrayBuffer but it is currently disabled in Chrome. Sadly there is no alternative for shared memory:

Note that SharedArrayBuffer was disabled by default in all major browsers on 5 January, 2018 in response to Spectre.

There are plans to re-add SharedArrayBuffer after proper security auditing will be complete. I guess we'll have to wait.

If you were running your code in Node - this would be hard but possible.

like image 143
Benjamin Gruenbaum Avatar answered Oct 28 '22 13:10

Benjamin Gruenbaum


Thanks to all who considered this issue. A solution exists as of 19 May 2018; hopefully a better one can replace it soon.

The current version works as follows:

Problem 1: How can we associate function calls from the main thread with function definitions in the worker thread?

S1: A map of Promise objects: Map[Long, PromiseWrapper]() associates a method invocation ID with a promise that can process the result. This simple multiplexing mechanism evolved from another Stack Overflow question. Thanks again to Justin du Coeur.

Problem 2: How can we invoke functions in the worker thread from the main thread?

S1: Pass a text representation of the function to the worker, then parse it with eval and invoke the resulting function. Unfortunately, eval comes with security risks. Besides, having to write pure JavaScript code in string values defeats most of the advantages of Scala.js, namely type safety and Scala syntax.

S2: Storing function definitions in a lookup table in worker scope and invoking the functions by passing the keys. This could work, but feels clunky in Scala because different functions take parameters that vary in number and type.

S3: Wrap the functions into serializable case classes, then send the serialized bytes from the main scope to the worker scope and invoke the function there. You can think of these case classes as message classes. The current solution uses this approach. It relies on BooPickle by Otto Chrons. The serialized class wraps the method call and any trivial function parameters, e.g. numbers, short strings, and simple case classes. Large data, like the TypedArray values featured in this question transfer from the main thread to the worker thread through a mechanism discussed later. Unfortunately, this approach means that all operations on the TypedArray values must be defined before compile time because BooPickle relies on macros, not reflection, to serialize and deserialize classes.

Problem 3: How can we pass the values of the TypedArray class variable, xArr to and from the worker thread without duplicating it?

S1: Because xArr conforms to the Transferrable interface, it can transfer wholly between the main and worker scopes. At the same time, the serialized classes that wrap the function calls conform to a trait that specifies an apply method with this signature:

def apply(parameters: js.Array[Transferable]): js.Array[Transferable]

By convention, the parameters array contains a serialized version of the message case class in index 0. Subsequent indices contain the TypedArray values. Each message class has its own unique implementation of this apply method.

Problem 4: How can we pass the result of the computation back to the promise that waits for it in the main thread?

S1: The apply methods mentioned in Problem 3.S1 return a new array of Transferrable objects with another serialized message class at its head. That message class wraps the return value from the computation: p(xArr) and, with an apply method of its own, instructs the main thread on how to interpret the array. In cases where p(xArr) returns large objects like other TypedArray values, those occupy subsequent positions in the array.

Problem 5: What if statements in the main thread try to access xArr when it has transferred to the worker thread?

S1. Now, any code in the main thread can only access xArr through a checkOut method and must restore it by calling a checkIn method. The checkOut method returns a Future that completes when xArr returns from the worker thread. Concurrent calls to checkOut get pushed onto a queue of promises. Any code that calls checkOut must call checkIn to pass control of xArr on to the next Promise waiting in the queue. Unfortunately, this design burdens the programmer with the responsibility of restoring xArr to its encompassing class. Unfortunately, schemes like this resemble classical concurrency models with locks and memory allocation methods like malloc and free, and tend toward buggy code that freezes or crashes.

Problem 5: After p(xArr) executes in the worker thread, how can xArr return to the class that encapsulated it in the main thread?

S1. Message case classes meant to invoke p(xArr) now inherit from a trait called Boomerang. As the name implies, these messages transfer from the main thread to the worker thread, invoke p(xArr) while there, then return, unchanged, to the main thread. Once returned to the main thread, Boomerang objects call relevant checkIn methods to restore xArr values to their original encapsulating objects.

For simplicity, this answer leaves out details about different types of Transferrable parameters, operations that mutate xArr instead of simply reading it and restoring it, operations that don't take any parameters but still yield large TypedArray responses, and operations that take multiple large TypedArray parameters, but minor modifications to the five solutions articulated above met those objectives.

With this as a baseline, can we:

Simplify this design?

Incorporate user defined operations?

Find safer alternatives to the checkOut, checkIn methods?

like image 34
Ben McKenneby Avatar answered Oct 28 '22 15:10

Ben McKenneby