I need to several some huge files (several gigs) from Java via FTP/HTTP. Is there a ready library (java / command line tool) to facilitate the download? Some obvious requirements are:
Edit - I'd really prefer not to write such a library but steal it (or pay) for an existing tested, production grade library. rsynch is not relevant since I need to download files from HTTP and FTP sites, it's not for internal file transfer.
The HTTP protocol does support starting a partial download at an offset, but has limited support for validating the local partial version of the file to make sure that it doesn't have junk attached to the end (or something similar). If your environment allows it, I recommend rsync
with the --partial option. Its designed to support this kind of functionality from the command line.
If you can't use rsync, you may want to try working with Commons-HTTPClient and utilizing the Range HTTP header to download manageable sized chunks.
If you know how to create sockets and threads in java it's not that difficult.
First create a request and read the headers to get the Content-length
header. Then devise a strategy to split your request in chunks of for example 500K each request. Then start say 10 requests using a thread for each request. In each request you have to define the Range
header.
Resuming your download is a matter of storing the ranges you haven't downloaded yet. I suggest you read this HTTP/1.1 Header Fields RFC here if you really want to get a good grasp on the protocol used.
However if you're looking for an easy way out rsync or scp should suffice.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With