Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Interact with browser JavaScript via command line or local script?

We offer browser-page JavaScript similar to imagemagick that helps people convert images to different sizes and formats. However, it requires webpage interaction.

Is it possible to let people automate this interaction -- without sending images to our server (thus increasing bandwidth cost and server load) and without requiring users to download a headless browser library like Puppeteer?

For instance, is the following flow possible:

  1. Open Chrome via the command line (or local script) to a specific web page.
  2. Upload an image to that web page.
  3. Invoke a script on the web page.
  4. Receive the script results and allow for local manipulation.

Launching Chrome is possible, but it's unclear if you can interact with a specific browser window after launching it.

like image 428
Crashalot Avatar asked Nov 04 '19 17:11

Crashalot


2 Answers

Should be technically automate-able, but it is far from straightforward.

Your question can be split into two parts: offline processing and upload automation.


Offline Processing

Assuming your image processing code is fully in-browser JavaScript (instead of, say, a modularized node program calling native libraries), it is possible to do all the processing in-browser.

File "uploaded" can be read, processed, and downloaded without sending anything to server. The processing may even happens in a background thread, keeping the UI responsive, such as a nice progress bar.

The code itself can be hosted online using Service Worker, or static html + javascript. Both can be opened and executed offline, once visited or deployed. (Note that Chrome severely limits static html, including a harsh restriction on web workers. Google prefers you to keep things online.)


Upload Automation

As mentioned above, a file selected by file input or dropped into the browser can be read by in-page JavaScript, but I'll keep calling it an "upload" action in tradition.

Chrome has some automation extensions, most notably Kantu, but they can't handle file upload because of Chrome's security restriction.

So, if you want to automate file selection, you need to use a native, out-of-browser automation tool, such as Kantu's XModules, AutoHotkey, or SikuliX. Commercial solution exists, but with similar restrictions given your unusual requirements of no headless browser.

  • AutoHotkey will be focused on simulating keyboard (Open browser, wait 5 second, press tab 10 times, press enter, wait 2 sec, type file name, press enter, and so on), and can be compiled into a deployable exe.

  • Sikulix is more powerful, but is also much harder to distribute; just the java runtime is bigger than a browser.

  • Kantu + XModules is kind of between the two. The users will need to install the browser extension, and its native extension, but once done everything happens in the browser (more or less).

All three methods involve simulation of typing the file name, because as far as I know there is no simpler way to automate it in a user-launched (non-headless) Chrome.

Name of the image file can be passed as parameter to the command line for AutoHotkey and Sikulix, or stored in a file and read by the script in case of Kantu.

In all three cases, the automation simulates a user, and the real-life user must not touch the computer while the script is running, or the automation will break.


How about command line?

Alternatively, if your aim is automation without deploying a browser, you may consider making it a command line node.js program, and package it as exe.

The distributable would be heavier than a compiled AutoHotkey, but there are much less moving parts, and thus much more reliable:

  • Independent from Chrome version or the existence of XModules.
  • All processing happens in its own process, instead of hijacking the user's Chrome.
  • Can be executed headlessly, very important for automation.
  • Flexible command line parameters.

But I like browser automation, it is so simple

Think again.

From my experience, many things will throw Browser/GUI automation off:

  1. Unusual screen resolution, browser zoom, os scaling, or last remembered Chrome size that distort your page beyond recognition.
  2. Browser extensions that change page elements, such as ad-blockers.
  3. IMEs and other programs that intercept keyboard input with hotkeys.
  4. Popups programs, such as anti-virus, windows update, or inserting a CD.
  5. Accidental locks, sleeps, logouts, keys left on keyboard, or power interruption.
  6. Or a simple Chrome update that breaks any of the 100 things you depends on.

So, yeah, here are your reasons why computer automation is better done headless.


Will my code be safe?

In case you are worried about security of your script, don't worry. The moment you want the processing to happens on client-side, the cat is out.

Technically, your code is protected by copyright. But good luck enforcing it. If you want to keep your code out of extraction/decryption/unobfucation/whatever (cough), you need keep it an online blackbox, no client side processing.

like image 68
Sheepy Avatar answered Nov 01 '22 17:11

Sheepy


One way to build around your web app would be:

1) redirect console.log to standard out (see here: In Chrome, how can I get the javascript console output to stdout/stderr ), probably with the appropriate --log-level flag and error messages redirected somewhere else, so some random messages don't break the whole thing,

2) from the script level, instead / besides saving the result file, console.log it in Base64,

3) and from the CLI side, use a pipe (pipes) that makes Base64 a proper file (and any additional processing).

like image 41
mbojko Avatar answered Nov 01 '22 19:11

mbojko