Php webscraping using simple html dom not working when output is out of order html tags

Tags:

I want to scrap some information of a webpage .It uses a table layout structure.

I want to extract the third table inside the nested table layout which contains a series of nested tables .Each publishing a result .But the code is not working

include('simple_html_dom.php');
$url = 'http://exams.keralauniversity.ac.in/Login/index.php?reslt=1';
$html = file_get_contents($url);
$result =$html->find("table", 2);
echo $result;

I Used Curl to extract website but the problem is its tags is in out of order so it cannot be extracted using simple dom element .

    function curl($url) {
            $ch = curl_init();  // Initialising cURL
            curl_setopt($ch, CURLOPT_URL,$url);    // Setting cURL's URL option with the $url variable passed into the function
            curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); // Setting cURL's option to return the webpage data
            $data = curl_exec($ch); // Executing the cURL request and assigning the returned data to the $data variable
            curl_close($ch);    // Closing cURL
            return $data;   // Returning the data from the function
        }

          function scrape_between($data, $start, $end){
        $data = stristr($data, $start); // Stripping all data from before $start
        $data = substr($data, strlen($start));  // Stripping $start
        $stop = stripos($data, $end);   // Getting the position of the $end of the data to scrape
        $data = substr($data, 0, $stop);    // Stripping all data from after and including the $end of the data to scrape
        return $data;   // Returning the scraped data from the function
    }
          $scraped_page  = curl($url);  // Executing our curl function to scrape the webpage http://www.example.com and return the results into the $scraped_website variable

           $scraped_data = scrape_between($scraped_page, ' </html>', '</table></td><td></td></tr>
   </table>');  
 echo $scraped_data;
 $myfile = fopen("newfile.html", "w") or die("Unable to open file!");

fwrite($myfile, $scraped_data);
fclose($myfile);

How to scrape the result and save the pdf

861

asked Nov 02 '15 09:11

codefreaK

2 Answers

Simple HTML Dom can't handle that html. So first switch to this library, Then do:

require_once('advanced_html_dom.php');

$dom = file_get_html('http://exams.keralauniversity.ac.in/Login/index.php?reslt=1');

$rows = array();
foreach($dom->find('tr.Function_Text_Normal:has(td[3])') as $tr){
  $row['num'] = $tr->find('td[2]', 0)->text;
  $row['text'] = $tr->find('td[3]', 0)->text;
  $row['pdf'] = $tr->find('td[3] a', 0)->href;
  if(preg_match_all('/\d+/', $tr->parent->find('u', 0)->text, $m)){
    list($row['day'], $row['month'], $row['year']) = $m[0];
  }

  // uncomment next 2 lines to save the pdf
  // $filename = preg_replace('/.*\//', '', $row['pdf']);
  // file_put_contents($filename, file_get_contents($row['pdf']));
  $rows[] = $row;
}
var_dump($rows);

153

answered Oct 31 '22 10:10

pguardiario

Find a sample code


    ?php
        // Defining the basic cURL function
        function curl($url) {
            $ch = curl_init();  // Initialising cURL
            curl_setopt($ch, CURLOPT_URL, $url);    // Setting cURL's URL option with the $url variable passed into the function
            curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); // Setting cURL's option to return the webpage data
            $data = curl_exec($ch); // Executing the cURL request and assigning the returned data to the $data variable
            curl_close($ch);    // Closing cURL
            return $data;   // Returning the data from the function
        }
    ?>

    <?php
        $scraped_website = curl("http://www.example.com");  // Executing our curl function to scrape the webpage http://www.example.com and return the results into the $scraped_website variable
$result =$substring($scraped_website ,11,7); //change values 11,7 for table
echo $result;
    ?>

answered Oct 31 '22 10:10

Ananta Prasad

Related questions
                            
                                laravel query php how to get max value within a range
                            
                                Laravel5 Class 'Laravel\Socialite\SocialiteServiceProvider' not found
                            
                                'Illuminate\Html\HtmlServiceProvider' not found when trying to install 'Illuminate\Html' in laravel 5
                            
                                How to select the next element with same class?
                            
                                SELECT last entry in column as unique from other column
                            
                                Unit test: Simulate a timeout with Guzzle 5
                            
                                Recursive(?) algorithm design
                            
                                How can I create a CSV file with PHP that preserves the Japanese characters?
                            
                                Laravel 5 - real meaning of "sometimes" in validation rules
                            
                                htaccess how to access/get search parameters after a url rewrite
                            
                                strftime with week number format {YYYY}W{WW} gives wrong week
                            
                                How to write Subquery in codeigniter active record for this query
                            
                                Null object pattern with Eloquent relations
                            
                                PHP simple parser to run only once a day
                            
                                How to sql query that depends on field in row?
                            
                                How to Customize Yii2 english validations messages translations?
                            
                                How to integrate Paytm with Codeigniter
                            
                                Add Custom function to Auth Class Laravel (Extends Guard Class)
                            
                                Finding the Most-Popular MediaWiki Extensions
                            
                                Models in mvc (best practices, PHP)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Php webscraping using simple html dom not working when output is out of order html tags

Tags:

php

web-scraping

simple-html-dom

codefreaK

People also ask

2 Answers

pguardiario

Ananta Prasad

Recent Activity

Donate For Us