Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get the content (text) of an URL after Javascript has run with PHP

Is it possible to get the content of a URL with PHP (using some sort of function like file_get_contents or header) but only after the execution of some JavaScript code?

Example:

mysite.com has a script that does loadUrlAfterJavascriptExec('http://exampletogetcontent.com/') and prints/echoes the content. imagine that some jQuery runs on http://exampletogetcontent.com/ that changes DOM, and loadUrlAfterJavascriptExec will get the resulting HTML

Can we do that?

Just to be clear, what I want is to get the content of a page through a URL, but only after JavaScript runs on the target page (the one PHP is getting its content).

I am aware PHP runs before the page is sent to the client, and JS only after that, but thought that maybe there was an expert workaround.

like image 509
Victor Ferreira Avatar asked Feb 13 '15 17:02

Victor Ferreira


People also ask

How do I get the URL of a website using JavaScript?

If you're using JavaScript in the browser you can get the full current URL by using window. location. href .

Can JavaScript read the source of any Web page?

If for some odd reason, you wanted to view the source code of another page without having to actually browse to that page and click “page view source,” you can use JavaScript to do so. In the example below, I use the “window.


3 Answers

Update 2 Adds more details on how to use phantomjs from PHP.

Update 1 (after clarification that javascript on target page need to run first)

Method 1:Use phantomjs(will execute javascript);

1. Download phantomjs and place the executable in a path that your PHP binary can reach.

2. Place the following 2 files in the same directory:

get-website.php

<?php
    
    $phantom_script= dirname(__FILE__). '/get-website.js'; 


    $response =  exec ('phantomjs ' . $phantom_script);

    echo  htmlspecialchars($response);
    ?>

get-website.js

var webPage = require('webpage');
var page = webPage.create();

page.open('http://google.com/', function(status) {
 console.log(page.content);
  phantom.exit();
});

3. Browse to get-website.php and the target site, http://google.com contents will return after executing inline javascript. You can also call this from a command line using php /path/to/get-website.php.

Method 2:Use Ajax with PHP (No phantomjs so won't run javascript);

/get-website.php

<?php
    
    $html=file_get_contents('http://google.com');
    echo $html;
    ?>

test.html

<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>on demo</title>
<style>
p {
color: red;
}
span {
color: blue;
}
</style>
<script src="https://code.jquery.com/jquery-1.10.2.js"></script>
</head>
<body>
<button id='click_me'>Click me</button>
<span style="display:none;"></span>
<script>

$( "#click_me" ).click(function () {
    $.get("/get-website.php", function(data) {
        var json = {
            html: JSON.stringify(data),
            delay: 1
        };
        alert(json.html);
        });
});
</script>
</body>
</html>
like image 108
AndrewD Avatar answered Oct 16 '22 09:10

AndrewD


I found a fantastic page on this, it's an entire tutorial on how to process the DOM of a page in PHP which is entirely created using javascript.

https://www.jacobward.co.uk/using-php-to-scrape-javascript-jquery-json-websites/ "PhantomJS development is suspended until further notice" so that option isn't a good one.

like image 3
Adamantus Avatar answered Oct 16 '22 09:10

Adamantus


I think the easiest and best way is using this package https://github.com/spatie/browsershot just install it completely and use the below code

Browsershot::url('https://example.com')->bodyHtml()
like image 1
Mahdi mehrabi Avatar answered Oct 16 '22 09:10

Mahdi mehrabi