My aim is to get an element <div id="calender"> and all what is in the element shown in a browser. The point is that normal get-html-source won't do the thing. The element what I am looking for does not exists in the html output of php-function file_get_contents.
I have tried to get the source by php with xpath byt the help of https://www.php.net/manual/en/class.domxpath.php which inludes a nice tool to get what is in any tag in the html page. But the problem here might be that the element (a calender) is formed to the loaded page by javascript and cannot be caught by server side php. So, is there a way I can catch such element (div) by javascript instead.
There are script examples of javascript for this kind of problem (if I have understood them correctly) but currently I cannot get a simple javascript to work. An example below shows how I have tried to built up a code. $ajax thing here is just one path I have tried to solve the problem but don't know how to use it. More here I cannot figure out why the simple javascript functions do not work (just test purposes).
<!doctype html>
<html lang="fi">
<head>
<meta charset="utf-8">
<title>load demo</title>
<style>
body {
font-size: 12px;
font-family: Arial;
}
</style>
<script type="text/javascript">
function ok {
alert "OK";
}
function get_html (my_html){
alert "OK";
var l = document.getElementById('my_link').value;
alert l;
alert my_html;
var url = my_html;
$.ajax({
url: url,
dataType: 'html'
success: function(data){
//do something with data, which is the page 1.html
var f = fs.open("testi_kalenteri.html", "w");
f.write(data);
f.close();
alert "data saved";
}
});
}
</script>
</head>
<body>
<p id ='my_link' onclick='get_html("lomarengas.fi/en/cottages/kuusamo-rukasaukko-9192")'>html-link</p>
<p id ='ok' onclick='ok()'>show ok</p>
</body>
</html>
Briefly, I have a link to a web page, which shows up a (booking) calendar in it but this calendar is missing in the "normal" source code, by file_get_contents (php). If I browse the html source with Chromes tools (F12) I can find the calendar there. T want that information get by javascript or by php or such.
If you read the source code of the page you point to (http://www.yllaksenonkalot.fi/booking/varaukset_akas.php), you notice that the calendar is loaded via an iframe.
And that iframe points to that location :
http://www.nettimokki.com/bookingCalendar.php?id_cottage=3629&utm_source=widget&utm_medium=widget&utm_campaign=widget
Which is in fact the real source of the calendar...
EDIT following your comment on this answer
Considering the real link : http://www.lomarengas.fi/en/cottages/kuusamo-rukasaukko-9192
If the calendar is not part of the generated html, it is surely asynchronously generated (in javascript, client side).
From this asumption, I inspected the source code (again). In the developper tools of my browser, in the Network section, where you can monitor what files are loaded, I looked for calls to server (everything but calls to resources : images, stylesheets...).
I then noticed calls to several urls with json file extensions like http://www.lomarengas.fi/api-ib/search/availability_data.json?serviceNumber=9192¤tMonthFirstDate=&duration=7.
I felt I was on the right track (asynchronous javscript calls to generate html with json datas), I looked for javascript code or files that was not the usual libraries files (jquery, bootstrap and such).
I stumbled upon that file : http://www.lomarengas.fi/resources_responsive/js/destination.js. It contains the code that generates asynchronously the calendar.
tl;dr
The calendar is indeed generated asynchronously.
You can't get the full html with a curl or file_get_content in PHP and
you can't access it with ajax code (due to Same-origin policy).
By the way, you should contact the site to see if you can access their api via PHP with their consent.
Hope it helped you understand the whole thing...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With