Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

headless internet browser? [closed]

I would like to do the following. Log into a website, click a couple of specific links, then click a download link. I'd like to run this as either a scheduled task on windows or cron job on Linux. I'm not picky about the language I use, but I'd like this to run with out putting a browser window up on the screen if possible.

like image 619
Jared Avatar asked May 02 '09 12:05

Jared


People also ask

What is headless mode in browser?

A headless browser is a web browser without a graphical user interface. Headless browsers provide automated control of a web page in an environment similar to popular web browsers, but they are executed via a command-line interface or using network communication.

How do I open my browser in headless mode?

Which command starts the google chrome web browser in headless mode? As we have already seen, you just have to add the flag –headless when you launch the browser to be in headless mode. With CLI (Command Line Interface), just write: chrome \<br> – headless \ # Runs Chrome in headless mode.

What is headless mode in Chrome?

Headless mode is a functionality that allows the execution of a full version of the latest Chrome browser while controlling it programmatically. It can be used on servers without dedicated graphics or display, meaning that it runs without its “head”, the Graphical User Interface (GUI).

What is a headless browser Selenium?

What is Headless testing? Headless testing is simply running your Selenium tests using a headless browser. It operates as your typical browser would, but without a user interface, making it excellent for automated testing.


2 Answers

Here are a list of headless browsers that I know about:

  • HtmlUnit - Java. Custom browser engine. Limited JavaScript support/DOM emulated. Open source.
  • Ghost - Python only. WebKit-based. Full JavaScript support. Open source.
  • Twill - Python/command line. Custom browser engine. No JavaScript. Open source.
  • PhantomJS - Command line/all platforms. WebKit-based. Full JavaScript support. Open source.
  • Awesomium - C++/.NET/all platforms. Chromium-based. Full JavaScript support. Commercial/free.
  • SimpleBrowser - .NET 4/C#. Custom browser engine. No JavaScript support. Open source.
  • ZombieJS - Node.js. Custom browser engine. JavaScript support/emulated DOM. Open source. Based on jsdom.
  • EnvJS - JavaScript via Java/Rhino. Custom browser engine. JavaScript support/emulated DOM. Open source.
  • Watir-webdriver with headless gem - Ruby via WebDriver. Full JS Support via Browsers (Firefox/Chrome/Safari/IE).
  • Spynner - Python only. PyQT and WebKit.
  • jsdom - Node.js. Custom browser engine. Supports JS via emulated DOM. Open source.
  • TrifleJS - port of PhantomJS using MSIE (Trident) and V8. Open source.
  • ui4j - Pure Java 8 solution. A wrapper library around the JavaFx WebKit Engine incl. headless modes.
  • Chromium Embedded Framework - Full up-to-date embedded version of Chromium with off-screen rendering as needed. C/C++, with .NET wrappers (and other languages). As it is Chromium, it has support for everything. BSD licensed.
  • Selenium WebDriver - Full support for JavaScript via browsers (Firefox, IE, Chrome, Safari, Opera). Officially supported bindings are C#, Java, JavaScript, Haskell, Perl, Ruby, PHP, Python, Objective-C, and R. Unofficial bindings are available for Qt and Go. Open source.

Headless browsers that have JavaScript support via an emulated DOM generally have issues with some sites that use more advanced/obscure browser features, or have functionality that has visual dependencies (e.g. via CSS positions and so forth), so whilst the pure JavaScript support in these browsers is generally complete, the actual supported browser functionality should be considered as partial only.

(Note: Original version of this post only mentioned HtmlUnit, hence the comments. If you know of other headless browser implementations and have edit rights, feel free to edit this post and add them.)

like image 152
Nathan Ridley Avatar answered Nov 23 '22 09:11

Nathan Ridley


Check out twill, a very convenient scripting language for precisely what you're looking for. From the examples:

setlocal username <your username> setlocal password <your password>  go http://www.slashdot.org/ formvalue 1 unickname $username formvalue 1 upasswd $password submit  code 200     # make sure form submission is correct! 

There's also a Python API if you're looking for more flexibility.

like image 33
orip Avatar answered Nov 23 '22 08:11

orip