Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get list of files/directories of an directory URL?

Tags:

java

url

Let say I have a URL: http://java.sun.com/j2se/1.5/pdf I want to get a list of all files/directories under the pdf directory.

I'm using Java 5.

I can get the list of dir with this program http://www.httrack.com/, but with Java I don't know if it is possible.

Does any body know how to get it in Java? Or how can this program do the job if Java can't?

like image 318
itro Avatar asked Jul 19 '12 13:07

itro


People also ask

Is it possible to get a list of files under a directory of a website?

How Do I Get All Files From A Website? The easiest way to search web directory is to connect to the webserver via SSH or RDP and run the command to output the list of local directories.

How do I get a list of folders in a folder?

You can use the DIR command by itself (just type “dir” at the Command Prompt) to list the files and folders in the current directory.


1 Answers

There are some conditions:

  1. The server must have enabled directory listing in order for you to see the content of it.
  2. There is no way I know of (no API or HTTP verb) to retrieve the listing, and so the listing is generally shown as a normal HTML page
  3. You will have to parse this HTML page in order to find the entries.

The parsing can be done easily using a lib like JSoup.

For example, using JSoup you can fetch the documents at url http://howto.unixdev.net/ like this:

import java.io.IOException;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;

public class Sample {
    public static void main(String[] args) throws IOException {
        Document doc = Jsoup.connect("http://howto.unixdev.net").get();
        for (Element file : doc.select("td.right td a")) {
            System.out.println(file.attr("href"));
        }
    }
}

Will output:

beignets.html
beignets.pdf
bsd-pam-ldap.html
ddns-updates.html
Debian_on_HP_dv6z.html
dextop-slackware.html
dirlist.html
downloads/
ldif/
Linux-SharePoint.html
rhfc3-apt.html
rhfc3-apt.tar.bz2
SUNWdsee-Debian.html
SUNWdtdte-b69.html
SUNWdtdte-b69.tar.bz2
tcshrc.html
Test_LVM_Trim_Ext4.html
Tru64-CS20-HOWTO.html

As for your sample url http://java.sun.com/j2se/1.5/pdf this is a page not found, so I think you're out of luck.

like image 70
Alex Avatar answered Sep 25 '22 23:09

Alex