Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to parallelize execution of a custom function formula while keeping the Google Sheet shareable and permissionless?

I have a Google Sheet with a custom function formula that: takes in a matrix and two vectors from a spreadsheet, does some lengthy matrix-vector calculations (>30 sec, so above the quota), before outputting the result as a bunch of rows. It is single-threaded, since that's what Google Apps Script (GAS) natively is, but I want to parallelize the calculation using a multi-threading workaround, so it speeds it up drastically.

Requirements (1-3):

  1. UX: It should run the calculations automatically and reactively as a custom function formula, which implies that the user doesn't have to manually start it by clicking a run button or similar. Like my single-threaded version currently does.

  2. Parallelizable: It should ideally spawn ~30 threads/processes, so that instead of taking >30 seconds as it now does (which makes it time out due to Google's quota limit), it should take ~1 second. (I know GAS is single-threaded, but there are workarounds, referenced below).

  3. Shareability: I should ideally be able to share the Sheet with other people, so they can "Make a copy" of it, and the script will still run the calculations for them:

  • 3.1 Permissionless: Without me having to manually hand out individual permissions to users (permissionless). For instance whenever someone "Makes a copy" and "Execute the app as user accessing the web app". My rudimentary testing suggest that this is possible.
  • 3.2 Non-intrusive: Without users of the spreadsheet having to give intrusive authorizations like "Give this spreadsheet/script/app access to your entire Google Drive or Gmail account?". Users having to give an non-intrusive authorization to a script/webapp can be acceptable, as long as requirement 3.1 is still maintained.
  • 3.3 UX: Without forcing users to view a HTML sidebar in the spreadsheet.

I have already read this excellent related answer by @TheMaster which outlines some potential ways of solving parallelization in Google Apps script in general. Workaround #3 google.script.run and workaround #4 UrlFetchApp.fetchAll (both using a Google Web App) looks most promising. But some details are unknown to me, such as if they can adhere to requirements 1 and 3 with its sub-requirements.

I can conceive of an other potential naïve workaround which would be to split the function up into several custom functions formulas and do the parallelization (by some kind of Map/Reduce) inside the spreadsheet itself (storing intermediary results back into the spreadsheet, and having custom function formulas work on that as reducers). But that's undesired, and probably unfeasible, in my case.

I'm very confident my function is parallelizable using some kind of Map/Reduce process. The function is currently optimized by doing all the calculations in-memory, without touching the spreadsheet in-between steps, before finally outputting the result to the spreadsheet. The details of it is quite intricate and well over 100 lines, so I don't want to overload you with more (and potentially confusing) information which doesn't really affect the general applicability of this case. For the context of this question you may assume that my function is parallelizable (and map-reduce'able), or consider any function you already know that would be. What's interesting is what's generally possible to achieve with parallelizationin Google Apps Script, while also maintaining the highest level of shareability and UX. I'll update this question with more details if needed.

Update 2020-06-19:

To be more clear, I do not rule out Google Web App workarounds entirely, as I haven't got experience with their practical limitations to know for sure if they can solve the problem within the requirements. I have updated the sub-requirements 3.1 and 3.2 to reflect this. I also added sub-req 3.3, to be clearer on the intent. I also removed req 4, since it was largely overlapping with req 1.

I also edited the question and removed the related sub-questions, so it is more focused on the single main HOWTO-question in the title. The requirements in my question should provide a clear objective standard for which answers would be considered best.

I realise the question might entail a search for the Holy Grail of Google Sheet multithreading workarounds, as @TheMaster has pointed out in private. Ideally, Google would provide one or more features to support multithreading, map-reduce, or more permissionless sharing. But until then I would really like to know what is the optimal workaround within the current constraints we have. I would hope this question is relevant to others as well, even considering the tight requirements.

like image 481
Magne Avatar asked Oct 16 '22 02:10

Magne


People also ask

What is automate functionality in Google Sheets?

You can use Apps Script to extend Google Sheets to save time and effort. Apps Script provides the Spreadsheet service which allows scripts to interact with your Google Sheet files and the data they contain. You can use this service to automate the following common spreadsheet tasks: Create or modify a spreadsheet.


1 Answers

If you publish a web-app with "anyone, even anonymous", execute as "Me", then the custom function can use UrlFetchApp.fetchAllAuthorization not needed to post to that web-app. This will run in parallelproof. This solves all the three requirements.

Caveat here is: If multiple people use the sheet, and the custom function will have to post to the "same" webapp (that you published to execute as you) for processing, Google will limit simultaneous executionsquota limit:30.

To workaround this, You can ask people using your sheet to publish their own web-apps. They'll have to do this once at the beginning and no authorization is needed.

If not, you'll need to host a custom server for the load or something like google-cloud-functions might help

like image 145
TheMaster Avatar answered Oct 20 '22 23:10

TheMaster