Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using Headless FireFox to Save All HTML files using command line in Linux

Using shell_exec with Xvfb and FireFox currently to capture screen shots. However, need to download the entire html (e.g. Save Page As --> Web Page complete.) to a directory using shell_exec. Have looked at all the different option available in the Mozilla Developers Forums but have not been able to figure out how to do this.

This code appears to be what I might need but where and how is this implemented so it can be accessible in shell_exec?

var file = Components.classes["@mozilla.org/file/local;1"]
.createInstance(Components.interfaces.nsILocalFile);
file.initWithPath("C:\\filename.html");
var wbp = Components.classes['@mozilla.org/embedding/browser/nsWebBrowserPersist;1']
.createInstance(Components.interfaces.nsIWebBrowserPersist);
wbp.saveDocument(content.document, file, null, null, null, null);

The Above Code Source

void saveDocument(
in nsIDOMDocument aDocument,
in nsISupports aFile,
in nsISupports aDataPath,
in string aOutputContentType,
in unsigned long aEncodingFlags,
in unsigned long aWrapColumn
);

The Above Code Source

There is a Stackoverflow manual solution here but it does not address shell_exec: How to save a webpage locally including pictures,etc

like image 282
user2036418 Avatar asked Feb 03 '13 03:02

user2036418


1 Answers

There are few options that I know of, but none that I know are fitting your question exactly..

  1. Open firefox http://yoursite.com from shell, then send keystrokes to firefox using xte or similar method. (This is not headless mode though.)
  2. Download using wget. It can work in recursive manner. Or alternately you can parse the HTML, if it is quite simple web page. If you need to submit form, use curl instead of wget.
  3. Use greasemonkey addon & write a script, which would get loaded on http://some-fake-page.com/?download=http://yoursite.com & then open firefox with that fake-page url.
  4. Develop your own firefox addon to do above work.

There may be other better options for this as well, but I don't know them.

like image 94
anishsane Avatar answered Nov 05 '22 21:11

anishsane