I want to save my special web pages after document loaded into special file name via all url and links convert to absolute url such as wget -k
.
//phantomjs
var page = require('webpage').create();
var url = 'http://google.com/';
page.open(url, function (status) {
var js = page.evaluate(function () {
return document;
});
console.log(js.all[0].outerHTML);
phantom.exit();
});
for example my html content somthing like this:
<a href="//page.html">page</a>
must be
<a href="http://google.com/page.html">page</a>
It's my sample script but how can i convert all url and links such as wget -k
using phantomjs?
You can modify your final HTML so that it has a <base>
tag - this will make all relative URLs working. In your case, try putting <base href="http://google.com/">
right after the <head>
on the page.
It is not really supported by PhantomJS is more than just an HTTP client. Imagine if there is a JavaScript code which pulls a random content with image on the main landing page.
The workaround which might or might not for you is to replace all the referred resource in the DOM. This is possible using some CSS3 selector (href for a, src for img, etc) and manual path resolve relative to the base URL. If you really need to track and enlist every single resource URL, use the network traffic monitoring feature.
Last but not least, to get the generated content you can use page.content
instead of that complicated dance with evaluate
and outerHTML
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With