Parse Website for URLs

Tags:

Just wondering if someone can help me further with the following. I want to parse the URL on this website:http://www.directorycritic.com/free-directory-list.html?pg=1&sort=pr

I have the following code:

<?PHP  
$url = "http://www.directorycritic.com/free-directory-list.html?pg=1&sort=pr";
$input = @file_get_contents($url) or die("Could not access file: $url"); 
$regexp = "<a\s[^>]*href=(\"??)([^\" >]*?)\\1[^>]*>(.*)<\/a>"; 
if(preg_match_all("/$regexp/siU", $input, $matches)) { 
// $matches[2] = array of link addresses 
// $matches[3] = array of link text - including HTML code
} 
?>

Which does nothing at present and what I need this to do is scrap all the URL in the table for all 16 pages and would really appreciate some help with how to amend the above to do that and output URL into a text file.

546

asked Dec 16 '10 13:12

Bill Johnson

1 Answers

Use HTML Dom Parser

$html = file_get_html('http://www.example.com/');

// Find all links
$links = array(); 
foreach($html->find('a') as $element) 
       $links[] = $element->href;

Now links array contains all URLs of given page and you can use these URLs to parse further.

Parsing HTML with regular expressions is not a good idea. Here are some related posts:

Using regular expressions to parse HTML: why not?
RegEx match open tags except XHTML self-contained tags

EDIT:

Some Other HTML Parsing tools as described by Gordon in comments below:

phpQuery
Zend_Dom
QueryPath
FluentDom

197

answered Nov 05 '22 18:11

Naveed

Related questions
                            
                                Can I use SimpleXML & Xpath to directly select an Elements Attribute?
                            
                                Running Drupal application on a webfarm (scalability ) , HOW?
                            
                                PHP mkdir() and fopen() does not work - permissions problem? umask problem?
                            
                                How to Resize Image while Uploading using Zend_File_Transfer_Adapter_Http();
                            
                                Updating to PHP 5.3 with deprecated functions warning disabled
                            
                                Warning: Cannot modify header information - headers already sent (PHP) [duplicate]
                            
                                Basic mail function (PHP) additional "-f" parameter question
                            
                                HTML5 + UTF-8: do i need to encode the GBP symbol (£)?
                            
                                web development/design hands on approach books? [closed]
                            
                                Is $this->escape() in the Zend view enough for xss
                            
                                How to add or remove (if already exists) a key=>value pair to an array?
                            
                                Strange parse error with static concatenated string variable [closed]
                            
                                PHP Account Activation Issues
                            
                                alternative to jpGraph
                            
                                i install phpmyadmin on my ubuntu but its not working
                            
                                One or many databases for application for many clients in PHP
                            
                                PHP: programatically submit form and get contents of resulting page?
                            
                                Joomla get plugin id
                            
                                How to create a fisheye effect with PHP GD
                            
                                output raw binary integer in php

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Parse Website for URLs

Tags:

html

php

parsing

html-parsing

Bill Johnson

People also ask

1 Answers

Naveed

Recent Activity

Donate For Us