I want to scrap some information of a webpage .It uses a table layout structure.
I want to extract the third table inside the nested table layout which contains a series of nested tables .Each publishing a result .But the code is not working
include('simple_html_dom.php');
$url = 'http://exams.keralauniversity.ac.in/Login/index.php?reslt=1';
$html = file_get_contents($url);
$result =$html->find("table", 2);
echo $result;
I Used Curl to extract website but the problem is its tags is in out of order so it cannot be extracted using simple dom element .
function curl($url) {
$ch = curl_init(); // Initialising cURL
curl_setopt($ch, CURLOPT_URL,$url); // Setting cURL's URL option with the $url variable passed into the function
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); // Setting cURL's option to return the webpage data
$data = curl_exec($ch); // Executing the cURL request and assigning the returned data to the $data variable
curl_close($ch); // Closing cURL
return $data; // Returning the data from the function
}
function scrape_between($data, $start, $end){
$data = stristr($data, $start); // Stripping all data from before $start
$data = substr($data, strlen($start)); // Stripping $start
$stop = stripos($data, $end); // Getting the position of the $end of the data to scrape
$data = substr($data, 0, $stop); // Stripping all data from after and including the $end of the data to scrape
return $data; // Returning the scraped data from the function
}
$scraped_page = curl($url); // Executing our curl function to scrape the webpage http://www.example.com and return the results into the $scraped_website variable
$scraped_data = scrape_between($scraped_page, ' </html>', '</table></td><td></td></tr>
</table>');
echo $scraped_data;
$myfile = fopen("newfile.html", "w") or die("Unable to open file!");
fwrite($myfile, $scraped_data);
fclose($myfile);
How to scrape the result and save the pdf
Web scraping lets you collect data from web pages across the internet. It's also called web crawling or web data extraction. PHP is a widely used back-end scripting language for creating dynamic websites and web applications. And you can implement a web scraper using plain PHP code.
The DOMParser interface provides the ability to parse XML or HTML source code from a string into a DOM Document . You can perform the opposite operation—converting a DOM tree into XML or HTML source—using the XMLSerializer interface.
The web scraping can be done by targeting the selected DOM components and then processing or storing the text between that DOM element of a web page. To do the same in PHP, there is an API which parses the whole page and looks for the required elements within the DOM. It is the Simple HTML DOM Parser.
Simple HTML Dom can't handle that html. So first switch to this library, Then do:
require_once('advanced_html_dom.php');
$dom = file_get_html('http://exams.keralauniversity.ac.in/Login/index.php?reslt=1');
$rows = array();
foreach($dom->find('tr.Function_Text_Normal:has(td[3])') as $tr){
$row['num'] = $tr->find('td[2]', 0)->text;
$row['text'] = $tr->find('td[3]', 0)->text;
$row['pdf'] = $tr->find('td[3] a', 0)->href;
if(preg_match_all('/\d+/', $tr->parent->find('u', 0)->text, $m)){
list($row['day'], $row['month'], $row['year']) = $m[0];
}
// uncomment next 2 lines to save the pdf
// $filename = preg_replace('/.*\//', '', $row['pdf']);
// file_put_contents($filename, file_get_contents($row['pdf']));
$rows[] = $row;
}
var_dump($rows);
Find a sample code
?php
// Defining the basic cURL function
function curl($url) {
$ch = curl_init(); // Initialising cURL
curl_setopt($ch, CURLOPT_URL, $url); // Setting cURL's URL option with the $url variable passed into the function
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); // Setting cURL's option to return the webpage data
$data = curl_exec($ch); // Executing the cURL request and assigning the returned data to the $data variable
curl_close($ch); // Closing cURL
return $data; // Returning the data from the function
}
?>
<?php
$scraped_website = curl("http://www.example.com"); // Executing our curl function to scrape the webpage http://www.example.com and return the results into the $scraped_website variable
$result =$substring($scraped_website ,11,7); //change values 11,7 for table
echo $result;
?>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With