Is it possible to parse the directory listing of a webpage which is external given the webpage is accessible and it shows a list of the files when I access it. I only want to know is it possible to parse the files dynamically in PHP and how? -thank you
Sorry for being not clear. I mean a directory listing such as: http://www.ibiblio.org/pub/ (Index of /..) and ability to read the content as array or something easy to manipulate in my script
Directory listing is a web server function that displays the directory contents when there is no index file in a specific website directory. It is dangerous to leave this function turned on for the web server because it leads to information disclosure.
Use the dir tag in HTML to display directory list.
You can use preg_match
or DomDocument
For your case:
$contents = file_get_contents("http://www.ibiblio.org/pub/");
preg_match_All("|href=[\"'](.*?)[\"']|", $contents, $hrefs);
var_dump($hrefs);
If you want to take a look at a working demo.
If you're getting a directory listing back that is full of links in a proper XHTML document you can use DOMDocument
, and code such as the following to get back a list of files:
$doc = new DOMDocument();
$doc->preserveWhitespace = false;
$doc->load('directorylisting.html');
$files = $doc->getElementsByTagName('a');
$files
is now a list of DOMElement
s that you can iterate through and get the href
attribute to get a full path to the files in the listing.
Note that this approach requires a properly formed directory listing returned from the server. You cannot, for example, do a request on stackoverflow.com
and get a directory listing of the files.
If this doesn't work (perhaps malformed HTML) you could use Regular Expressions (eg. preg_match_all
) to find <a
tags, like such:
preg_match_all('@<a href\="([a-zA-Z\.\-\_\/ ]*)">(.*)</a>@', file_get_contents('http://www.ibiblio.org/pub/'), $files);
var_dump($files);
$files
would still be matched elements, just a set of arrays.
UPDATE, I tested with your URL (http://www.ibiblio.org/pub/
) and it works fine (the preg_match_all
method).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With