Let say I have a URL: http://java.sun.com/j2se/1.5/pdf
I want to get a list of all files/directories under the pdf
directory.
I'm using Java 5.
I can get the list of dir with this program http://www.httrack.com/, but with Java I don't know if it is possible.
Does any body know how to get it in Java? Or how can this program do the job if Java can't?
How Do I Get All Files From A Website? The easiest way to search web directory is to connect to the webserver via SSH or RDP and run the command to output the list of local directories.
You can use the DIR command by itself (just type “dir” at the Command Prompt) to list the files and folders in the current directory.
There are some conditions:
The parsing can be done easily using a lib like JSoup.
For example, using JSoup you can fetch the documents at url http://howto.unixdev.net/
like this:
import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
public class Sample {
public static void main(String[] args) throws IOException {
Document doc = Jsoup.connect("http://howto.unixdev.net").get();
for (Element file : doc.select("td.right td a")) {
System.out.println(file.attr("href"));
}
}
}
Will output:
beignets.html
beignets.pdf
bsd-pam-ldap.html
ddns-updates.html
Debian_on_HP_dv6z.html
dextop-slackware.html
dirlist.html
downloads/
ldif/
Linux-SharePoint.html
rhfc3-apt.html
rhfc3-apt.tar.bz2
SUNWdsee-Debian.html
SUNWdtdte-b69.html
SUNWdtdte-b69.tar.bz2
tcshrc.html
Test_LVM_Trim_Ext4.html
Tru64-CS20-HOWTO.html
As for your sample url http://java.sun.com/j2se/1.5/pdf
this is a page not found, so I think you're out of luck.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With