Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to cancel a wasm process from within a webworker

I have a wasm process (compiled from c++) that processes data inside a web application. Let's say the necessary code looks like this:

std::vector<JSONObject> data
for (size_t i = 0; i < data.size(); i++)
{
    process_data(data[i]);

    if (i % 1000 == 0) {
        bool is_cancelled = check_if_cancelled();
        if (is_cancelled) {
            break;
        }
    }

}

This code basically "runs/processes a query" similar to a SQL query interface:

enter image description here

However, queries may take several minutes to run/process and at any given time the user may cancel their query. The cancellation process would occur in the normal javascript/web application, outside of the service Worker running the wasm.

My question then is what would be an example of how we could know that the user has clicked the 'cancel' button and communicate it to the wasm process so that knows the process has been cancelled so it can exit? Using the worker.terminate() is not an option, as we need to keep all the loaded data for that worker and cannot just kill that worker (it needs to stay alive with its stored data, so another query can be run...).

What would be an example way to communicate here between the javascript and worker/wasm/c++ application so that we can know when to exit, and how to do it properly?

Additionally, let us suppose a typical query takes 60s to run and processes 500MB of data in-browser using cpp/wasm.


Update: I think there are the following possible solutions here based on some research (and the initial answers/comments below) with some feedback on them:

  1. Use two workers, with one worker storing the data and another worker processing the data. In this way the processing-worker can be terminated, and the data will always remain. Feasible? Not really, as it would take way too much time to copy over ~ 500MB of data to the webworker whenever it starts. This could have been done (previously) using SharedArrayBuffer, but its support is now quite limited/nonexistent due to some security concerns. Too bad, as this seems like by far the best solution if it were supported...

  2. Use a single worker using Emterpreter and using emscripten_sleep_with_yield. Feasible? No, destroys performance when using Emterpreter (mentioned in the docs above), and slows down all queries by about 4-6x.

  3. Always run a second worker and in the UI just display the most recent. Feasible? No, would probably run into quite a few OOM errors if it's not a shared data structure and the data size is 500MB x 2 = 1GB (500MB seems to be a large though acceptable size when running in a modern desktop browser/computer).

  4. Use an API call to a server to store the status and check whether the query is cancelled or not. Feasible? Yes, though it seems quite heavy-handed to long-poll with network requests every second from every running query.

  5. Use an incremental-parsing approach where only a row at a time is parsed. Feasible? Yes, but also would require a tremendous amount of re-writing the parsing functions so that every function supports this (the actual data parsing is handled in several functions -- filter, search, calculate, group by, sort, etc. etc.

  6. Use IndexedDB and store the state in javascript. Allocate a chunk of memory in WASM, then return its pointer to JavaScript. Then read database there and fill the pointer. Then process your data in C++. Feasible? Not sure, though this seems like the best solution if it can be implemented.

  7. [Anything else?]

In the bounty then I was wondering three things:

  1. If the above six analyses seem generally valid?
  2. Are there other (perhaps better) approaches I'm missing?
  3. Would anyone be able to show a very basic example of doing #6 -- seems like that would be the best solution if it's possible and works cross-browser.
like image 869
David542 Avatar asked Aug 05 '19 20:08

David542


People also ask

How do I cancel a Webworker?

terminate() The terminate() method of the Worker interface immediately terminates the Worker . This does not offer the worker an opportunity to finish its operations; it is stopped at once.

How many workers can run concurrently?

How many web workers can run concurrently JavaScript? A web worker is a JavaScript program running on a different thread, in parallel with main thread. The browser creates one thread per tab. The main thread can spawn an unlimited number of web workers, until the user's system resources are fully consumed.

Which is not a valid method of web worker?

There are some cases where you can't use web workers, though. For example, you can't use them to manipulate the DOM or the properties of the window object. This is because the worker does not have the access to the window object. Web workers can also spawn new web workers.


2 Answers

For Chrome (only) you may use shared memory (shared buffer as memory). And raise a flag in memory when you want to halt. Not a big fan of this solution (is complex and is supported only in chrome). It also depends on how your query works, and if there are places where the lengthy query can check the flag.

Instead you should probably call the c++ function multiple times (e.g. for each query) and check if you should halt after each call (just send a message to the worker to halt).

What I mean by multiple time is make the query in stages (multiple function cals for a single query). It may not be applicable in your case.

Regardless, AFAIK there is no way to send a signal to a Webassembly execution (e.g. Linux kill). Therefore, you'll have to wait for the operation to finish in order to complete the cancellation.

I'm attaching a code snippet that may explain this idea.

worker.js:

... init webassembly

onmessage = function(q) {
	// query received from main thread.
	const result = ... call webassembly(q);
	postMessage(result);
}

main.js:

const worker = new Worker("worker.js");
const cancel = false;
const processing = false;

worker.onmessage(function(r) {
	// when worker has finished processing the query.
	// r is the results of the processing.
	processing = false;

	if (cancel === true) {
		// processing is done, but result is not required.
		// instead of showing the results, update that the query was canceled.
		cancel = false;
		... update UI "cancled".
		return;
	}
	
	... update UI "results r".
}

function onCancel() {
	// Occurs when user clicks on the cancel button.
	if (cancel) {
		// sanity test - prevent this in UI.
		throw "already cancelling";
	}
	
	cancel = true;
	
	... update UI "canceling". 
}

function onQuery(q) {
	if (processing === true) {
		// sanity test - prevent this in UI.
		throw "already processing";
	}
	
	processing = true;
	// Send the query to the worker.
	// When the worker receives the message it will process the query via webassembly.
	worker.postMessage(q);
}

An idea from user experience perspective: You may create ~two workers. This will take twice the memory, but will allow you to "cancel" "immediately" once. (it will just mean that in the backend the 2nd worker will run the next query, and when the 1st finishes the cancellation, cancellation will again become immediate).

like image 127
Tomer Avatar answered Oct 26 '22 22:10

Tomer


Shared Thread

Since the worker and the C++ function that it called share the same thread, the worker will also be blocked until the C++ loop is finished, and won't be able to handle any incoming messages. I think the a solid option would minimize the amount of time that the thread is blocked by instead initializing one iteration at a time from the main application.

It would look something like this.

main.js  ->  worker.js  ->  C++ function  ->  worker.js  ->  main.js

Breaking up the Loop

Below, C++ has a variable initialized at 0, which will be incremented at each loop iteration and stored in memory. C++ function then performs one iteration of the loop, increments the variable to keep track of loop position, and immediately breaks.

int x;
x = 0; // initialized counter at 0

std::vector<JSONObject> data
for (size_t i = x; i < data.size(); i++)
{
    process_data(data[i]);

    x++ // increment counter
    break; // stop function until told to iterate again starting at x
}

Then you should be able to post a message to the web worker, which then sends a message to main.js that the thread is no longer blocked.

Canceling the Operation

From this point, main.js knows that the web worker thread is no longer blocked, and can decide whether or not to tell the web worker to execute the C++ function again (with the C++ variable keeping track of the loop increment in memory.)

let continueOperation = true
// here you can set to false at any time since the thread is not blocked here

worker.expensiveThreadBlockingFunction()
// results in one iteration of the loop being iterated until message is received below

worker.onmessage = function(e) {
    if (continueOperation) {
        worker.expensiveThreadBlockingFunction()
        // execute worker function again, ultimately continuing the increment in C++
    } {
        return false
        // or send message to worker to reset C++ counter to prepare for next execution
    }
}

Continuing the Operation

Assuming all is well, and the user has not cancelled the operation, the loop should continue until finished. Keep in mind you should also send a distinct message for whether the loop has completed, or needs to continue, so you don't keep blocking the worker thread.

like image 28
Nathan Fries Avatar answered Oct 26 '22 20:10

Nathan Fries