We offer browser-page JavaScript similar to imagemagick that helps people convert images to different sizes and formats. However, it requires webpage interaction.
Is it possible to let people automate this interaction -- without sending images to our server (thus increasing bandwidth cost and server load) and without requiring users to download a headless browser library like Puppeteer?
For instance, is the following flow possible:
Launching Chrome is possible, but it's unclear if you can interact with a specific browser window after launching it.
Should be technically automate-able, but it is far from straightforward.
Your question can be split into two parts: offline processing and upload automation.
Offline Processing
Assuming your image processing code is fully in-browser JavaScript (instead of, say, a modularized node program calling native libraries), it is possible to do all the processing in-browser.
File "uploaded" can be read, processed, and downloaded without sending anything to server. The processing may even happens in a background thread, keeping the UI responsive, such as a nice progress bar.
The code itself can be hosted online using Service Worker, or static html + javascript. Both can be opened and executed offline, once visited or deployed. (Note that Chrome severely limits static html, including a harsh restriction on web workers. Google prefers you to keep things online.)
Upload Automation
As mentioned above, a file selected by file input or dropped into the browser can be read by in-page JavaScript, but I'll keep calling it an "upload" action in tradition.
Chrome has some automation extensions, most notably Kantu, but they can't handle file upload because of Chrome's security restriction.
So, if you want to automate file selection, you need to use a native, out-of-browser automation tool, such as Kantu's XModules, AutoHotkey, or SikuliX. Commercial solution exists, but with similar restrictions given your unusual requirements of no headless browser.
AutoHotkey will be focused on simulating keyboard (Open browser, wait 5 second, press tab 10 times, press enter, wait 2 sec, type file name, press enter, and so on), and can be compiled into a deployable exe.
Sikulix is more powerful, but is also much harder to distribute; just the java runtime is bigger than a browser.
Kantu + XModules is kind of between the two. The users will need to install the browser extension, and its native extension, but once done everything happens in the browser (more or less).
All three methods involve simulation of typing the file name, because as far as I know there is no simpler way to automate it in a user-launched (non-headless) Chrome.
Name of the image file can be passed as parameter to the command line for AutoHotkey and Sikulix, or stored in a file and read by the script in case of Kantu.
In all three cases, the automation simulates a user, and the real-life user must not touch the computer while the script is running, or the automation will break.
How about command line?
Alternatively, if your aim is automation without deploying a browser, you may consider making it a command line node.js program, and package it as exe.
The distributable would be heavier than a compiled AutoHotkey, but there are much less moving parts, and thus much more reliable:
But I like browser automation, it is so simple
Think again.
From my experience, many things will throw Browser/GUI automation off:
So, yeah, here are your reasons why computer automation is better done headless.
Will my code be safe?
In case you are worried about security of your script, don't worry. The moment you want the processing to happens on client-side, the cat is out.
Technically, your code is protected by copyright. But good luck enforcing it. If you want to keep your code out of extraction/decryption/unobfucation/whatever (cough), you need keep it an online blackbox, no client side processing.
One way to build around your web app would be:
1) redirect console.log to standard out (see here: In Chrome, how can I get the javascript console output to stdout/stderr ), probably with the appropriate --log-level
flag and error messages redirected somewhere else, so some random messages don't break the whole thing,
2) from the script level, instead / besides saving the result file, console.log it in Base64,
3) and from the CLI side, use a pipe (pipes) that makes Base64 a proper file (and any additional processing).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With