Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can I parse the directory listing of an external webpage?

Tags:

php

Is it possible to parse the directory listing of a webpage which is external given the webpage is accessible and it shows a list of the files when I access it. I only want to know is it possible to parse the files dynamically in PHP and how? -thank you

Sorry for being not clear. I mean a directory listing such as: http://www.ibiblio.org/pub/ (Index of /..) and ability to read the content as array or something easy to manipulate in my script

like image 774
Ahmad Fouad Avatar asked Jul 21 '11 09:07

Ahmad Fouad


People also ask

What are the risks when directory listing is activated?

Directory listing is a web server function that displays the directory contents when there is no index file in a specific website directory. It is dangerous to leave this function turned on for the web server because it leads to information disclosure.

How do I list files in a directory in HTML?

Use the dir tag in HTML to display directory list.


2 Answers

You can use preg_match or DomDocument

For your case:

$contents = file_get_contents("http://www.ibiblio.org/pub/");
preg_match_All("|href=[\"'](.*?)[\"']|", $contents, $hrefs);
var_dump($hrefs);

If you want to take a look at a working demo.

like image 153
genesis Avatar answered Oct 19 '22 10:10

genesis


If you're getting a directory listing back that is full of links in a proper XHTML document you can use DOMDocument, and code such as the following to get back a list of files:

$doc = new DOMDocument();
$doc->preserveWhitespace = false;
$doc->load('directorylisting.html');

$files = $doc->getElementsByTagName('a');

$files is now a list of DOMElements that you can iterate through and get the href attribute to get a full path to the files in the listing.

Note that this approach requires a properly formed directory listing returned from the server. You cannot, for example, do a request on stackoverflow.com and get a directory listing of the files.

If this doesn't work (perhaps malformed HTML) you could use Regular Expressions (eg. preg_match_all) to find <a tags, like such:

preg_match_all('@<a href\="([a-zA-Z\.\-\_\/ ]*)">(.*)</a>@', file_get_contents('http://www.ibiblio.org/pub/'), $files);
var_dump($files);

$files would still be matched elements, just a set of arrays.


UPDATE, I tested with your URL (http://www.ibiblio.org/pub/) and it works fine (the preg_match_all method).

like image 25
Rudi Visser Avatar answered Oct 19 '22 08:10

Rudi Visser