I am developing a project, for which I want to scrape the contents of a website in the background and get some limited content from that scraped website. For example, in my page I have "userid" and "password" fields, by using those I will access my mail and scrape my inbox contents and display it in my page. I done the above by using javascript alone. But when I click the sign in button the URL of my page (http://localhost/web/Login.html) is changed to the URL (http://mail.in.com/mails/inbox.php?nomail=....) which I am scraped. But I scrap the details without changing my url.

Definitely go with PHP Simple HTML DOM Parser. It's fast, easy and super flexible. It basically sticks an entire HTML page in an object then you can access any element from that object. Like the example of the official site, to get all links on the main Google page: <pre class="prettyprint"><code>// Create DOM from URL or file $html = file_get_html('http://www.google.com/'); // Find all images foreach($html->find('img') as $element) echo $element->src . ' '; // Find all links foreach($html->find('a') as $element) echo $element->href . ' '; </code></pre>

Scrape web page contents

Tags:

php

curl

httprequest

web-scraping

screen-scraping

I am developing a project, for which I want to scrape the contents of a website in the background and get some limited content from that scraped website. For example, in my page I have "userid" and "password" fields, by using those I will access my mail and scrape my inbox contents and display it in my page.

I done the above by using javascript alone. But when I click the sign in button the URL of my page (http://localhost/web/Login.html) is changed to the URL (http://mail.in.com/mails/inbox.php?nomail=....) which I am scraped. But I scrap the details without changing my url.

308

asked Feb 25 '09 05:02

Sakthivel

1 Answers

Definitely go with PHP Simple HTML DOM Parser. It's fast, easy and super flexible. It basically sticks an entire HTML page in an object then you can access any element from that object.

Like the example of the official site, to get all links on the main Google page:

Click to copy

// Create DOM from URL or file
$html = file_get_html('http://www.google.com/');

// Find all images 
foreach($html->find('img') as $element) 
       echo $element->src . '<br>';

// Find all links 
foreach($html->find('a') as $element) 
       echo $element->href . '<br>';

162

answered Oct 14 '22 03:10

givp

Related questions
                            
                                default as first option in switch statement?
                            
                                Writing TXT File with PHP, Want to Add an Actual Line Break
                            
                                How to export in phpmyadmin not include id column (the AUTO_INCREMENT column)
                            
                                how to prevent directory access and show forbidden error in php
                            
                                PHP how to list out all public functions of class
                            
                                why use 0xffff over 65535
                            
                                Create a transparent png file using PHP
                            
                                PHP x64 not supported x64 integer.. (Wamp & dropbox API)
                            
                                laravel search multiple words separated by space
                            
                                How do I fix the PHP Strict error "Creating default object from empty value"?
                            
                                Kohana 3 get current controller/action/arguments
                            
                                PHP Regex find text between custom added HTML Tags
                            
                                PHP opendir() to list folders only
                            
                                MySQL How to SELECT data from table which recorded today?
                            
                                Dynamically call Class with variable number of parameters in the constructor
                            
                                Retrieve image orientation in PHP
                            
                                Laravel 4 upload image form
                            
                                how to disable google analytics on localhost
                            
                                FCM push notification issue: "error":"NotRegistered"
                            
                                Fatal Error: composer.lock was created for PHP version 7.4 or higher but the current PHP version is 7.3.11

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With