<p>I been trying to extract site table text along with its link from the given table to (which is in site1.com) to my php page using a web crawler. </p> <p>But unfortunately, due to incorrect input of Array index in the php code, it came error as output.</p> <p>site1.com</p> <pre class="prettyprint"><code><table border="0" cellpadding="0" cellspacing="0" width="100%" class="Table2"> <tbody><tr> <td width="1%" valign="top" class="Title2">&nbsp;</td> <td width="65%" valign="top" class="Title2">Subject</td> <td width="1%" valign="top" class="Title2">&nbsp;</td> <td width="14%" valign="top" align="Center" class="Title2">Last Update</td> <td width="1%" valign="top" class="Title2">&nbsp;</td> <td width="8%" valign="top" align="Center" class="Title2">Replies</td> <td width="1%" valign="top" class="Title2">&nbsp;</td> <td width="9%" valign="top" align="Center" class="Title2">Views</td> </tr> <tr> <td width="1%" height="25">&nbsp;</td> <td width="64%" height="25" class="FootNotes2"><a href="/files/forum/2017/1/837110.php" target="_top" class="Links2">Serious dedicated study partner for U World</a> - step12013</td> <td width="1%" height="25">&nbsp;</td> <td width="14%" height="25" class="FootNotes2" align="center">02/11/17 01:50</td> <td width="1%" height="25">&nbsp;</td> <td width="8%" height="25" align="Center" class="FootNotes2">10</td> <td width="1%" height="25">&nbsp;</td> <td width="9%" height="25" align="Center" class="FootNotes2">318</td> </tr> </tbody> </table> </code></pre> <p>The php. web crawler as ::</p> <pre class="prettyprint"><code><?php function get_data($url) { $ch = curl_init(); curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); curl_setopt($ch, CURLOPT_URL,$url); $result=curl_exec($ch); curl_close($ch); return $result; } $returned_content = get_data('http://www.usmleforum.com/forum/index.php?forum=1'); $first_step = explode( '<table class="Table2">' , $returned_content ); $second_step = explode('</table>', $first_step[0]); $third_step = explode('<tr>', $second_step[1]); // print_r($third_step); foreach ($third_step as $key=>$element) { $child_first = explode( '<td class="FootNotes2"' , $element ); $child_second = explode( '</td>' , $child_first[1] ); $child_third = explode( '<a href=' , $child_second[0] ); $child_fourth = explode( '</a>' , $child_third[0] ); $final = "<a href=".$child_fourth[0]."</a></br>"; ?> <li target="_blank" class="itemtitle"> <?php echo $final?> </li> <?php if($key==10){ break; } } ?> </code></pre> <p>Now the Array Index on the above php code can be the culprit. (i guess) If so, can some one please explain me how to make this work.</p> <p>But what my final requirement from this code is:: to get the above text in second with a link associated to it.</p> <p>Any help is Appreciated.. </p>

<p>Using the Simple HTML DOM Parser library, you can use the following code:</p> <pre class="prettyprint"><code><?php require('simple_html_dom.php'); // you might need to change this, depending on where you saved the library file. $html = file_get_html('http://www.usmleforum.com/forum/index.php?forum=1'); foreach($html->find('td.FootNotes2 a') as $element) { // find all <a>-elements inside a <td class="FootNotes2">-element $element->href = "http://www.usmleforum.com" . $element->href; // you can also access only certain attributes of the elements (e.g. the url). echo $element.'</br>'; // do something with the elements. } ?> </code></pre>

Extracting Site data through Web Crawler outputs error due to mis-match of Array Index

Tags:

php

web-crawler

I been trying to extract site table text along with its link from the given table to (which is in site1.com) to my php page using a web crawler.

But unfortunately, due to incorrect input of Array index in the php code, it came error as output.

site1.com

<table border="0" cellpadding="0" cellspacing="0" width="100%" class="Table2">
<tbody><tr>
    <td width="1%" valign="top" class="Title2">&nbsp;</td>
    <td width="65%" valign="top" class="Title2">Subject</td>
    <td width="1%" valign="top" class="Title2">&nbsp;</td>
    <td width="14%" valign="top" align="Center" class="Title2">Last Update</td>
    <td width="1%" valign="top" class="Title2">&nbsp;</td>
    <td width="8%" valign="top" align="Center" class="Title2">Replies</td>
    <td width="1%" valign="top" class="Title2">&nbsp;</td>
    <td width="9%" valign="top" align="Center" class="Title2">Views</td>
</tr>
<tr>
    <td width="1%" height="25">&nbsp;</td>
    <td width="64%" height="25" class="FootNotes2"><a href="/files/forum/2017/1/837110.php" target="_top" class="Links2">Serious dedicated study partner for U World</a> - step12013</td>
    <td width="1%" height="25">&nbsp;</td>
    <td width="14%" height="25" class="FootNotes2" align="center">02/11/17 01:50</td>
    <td width="1%" height="25">&nbsp;</td>
    <td width="8%" height="25" align="Center" class="FootNotes2">10</td>
    <td width="1%" height="25">&nbsp;</td>
    <td width="9%" height="25" align="Center" class="FootNotes2">318</td>
</tr>
</tbody>
</table>

The php. web crawler as ::

<?php
    function get_data($url) {
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_URL,$url);
    $result=curl_exec($ch);
    curl_close($ch);
    return $result;
    }
    $returned_content = get_data('http://www.usmleforum.com/forum/index.php?forum=1');
    $first_step = explode( '<table class="Table2">' , $returned_content );
    $second_step = explode('</table>', $first_step[0]);
    $third_step = explode('<tr>', $second_step[1]);
    // print_r($third_step);
    foreach ($third_step as $key=>$element) {
    $child_first = explode( '<td class="FootNotes2"' , $element );
    $child_second = explode( '</td>' , $child_first[1] );
    $child_third = explode( '<a href=' , $child_second[0] );
    $child_fourth = explode( '</a>' , $child_third[0] );
    $final = "<a href=".$child_fourth[0]."</a></br>";
?>

<li target="_blank" class="itemtitle">
    <?php echo $final?>
</li>

<?php
    if($key==10){
       break;
        }
    }
?>

Now the Array Index on the above php code can be the culprit. (i guess) If so, can some one please explain me how to make this work.

But what my final requirement from this code is:: to get the above text in second with a link associated to it.

Any help is Appreciated..

621

asked Feb 09 '17 13:02

harishk

1 Answers

Using the Simple HTML DOM Parser library, you can use the following code:

<?php
    require('simple_html_dom.php'); // you might need to change this, depending on where you saved the library file.

    $html = file_get_html('http://www.usmleforum.com/forum/index.php?forum=1');

    foreach($html->find('td.FootNotes2 a') as $element) { // find all <a>-elements inside a <td class="FootNotes2">-element
        $element->href = "http://www.usmleforum.com" . $element->href;  // you can also access only certain attributes of the elements (e.g. the url).
        echo $element.'</br>';  // do something with the elements.
    }
?>

169

answered Sep 25 '22 08:09

MrDarkLynx

Related questions
                            
                                Degree '°' character not displaying in php json_encode function, how to display this?
                            
                                How to get mobile inbox message from mobile to database using php?
                            
                                MySQL SSL Remote Connection Error: Unable to get Private Key
                            
                                Laravel 4 - route is not defined, on redirect
                            
                                Mimicking an ajax call with Curl PHP
                            
                                Is it good to use htmlspecialchars() before Inserting into MySQL?
                            
                                How to properly inject dependency into Laravel artisan command?
                            
                                How can I connect to a db4free.net database with PHP?
                            
                                Why does accessing array index on boolean value does not raise any kind of error?
                            
                                Laravel Form methods VS traditional coding
                            
                                SQLSTATE[HY000] [2002] Resource temporarily unavailable - mysql - innodb and pdo
                            
                                GuzzleHttp\Client change base url dynamically
                            
                                What does 'field declared dynamically' mean in this situation?
                            
                                Can php spl_autoload_register & composer autoloader work together?
                            
                                CodeIgniter sending POST data to specific URL - API data
                            
                                How to close or reset a pconnect() connection when using PHPRedis and PHP-FPM?
                            
                                Imagick SVG to JPG error no decode delegate
                            
                                How to iterate over Stripe subscription collection
                            
                                Laravel custom messages for array validation
                            
                                PHP Warning: require_once(/var/www/html/wp-config.php): failed to open stream: Permission denied in /var/www/html/wp-load.php on line 37

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With