Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PhantomJS and getting modified DOM

I'm developing a tool that needs to download a web page from 3rd party server, execute it as a browser would and then parse the HTML. What I struggle with is that the tool need to parse the HTML after all javascript is executed and DOM is modified. I'm trying to use PhantomJS for this purpose and it works on small snippets of code (just a tiny html document with external javascript that adds some nodes to DOM) but when I do the same with a real site (http://www.dba.dk/) I'm not getting the final HTML after all modifications done by the js code.

I really need help on this as I have been stuck with it for more than a week.

My PhantomJS code is simple:

if (phantom.state.length === 0) {
     if (phantom.args.length === 0) {
             console.log('Usage: test.js <some URL>');
             phantom.exit();
     } else {
             var address = phantom.args[0];
             phantom.state = Date.now().toString();
             phantom.viewportSize = { width: 1280, height: 800 };
             phantom.open(address);
     }
} else {
     var elapsed = Date.now() - new Date().setTime(phantom.state);
     if (phantom.loadStatus === 'success') {
             if (!first_time) {
                     var first_time = true;
                     if (!document.addEventListener) {
                             console.log('Not SUPPORTED!');
                     }
                     phantom.render('result.png');
                     var markup = document.documentElement.innerHTML;
                     console.log(markup);
                     phantom.exit();
             }
     } else {
             console.log('FAIL to load the address');
             phantom.exit();
     }
}

the HTML dumped to the console doesn't contain content generated dynamically

like image 897
intellion Avatar asked Mar 30 '11 18:03

intellion


1 Answers

The problem was in the Flash plugin. The pages were detecting its absense. Once it was loaded correctly the problem was gone

like image 178
intellion Avatar answered Oct 23 '22 15:10

intellion