I am trying to use httrack (http://www.httrack.com/) in order to download a single page, not the entire site. So, for example, when using httrack in order to download www.google.com it should only download the html found under www.google.com along with all stylesheets, images and javascript and not follow any links to images.google.com, labs.google.com or www.google.com/subdir/ etc.
I tried the -w
option but that did not make any difference.
What would be the right command?
EDIT
I tried using httrack "http://www.google.com/" -O "./www.google.com" "http://www.google.com/" -v -s0 --depth=1
but then it wont copy any images.
What I basically want is just downloading the index file of that domain along with all assets, but not the content of any external or internal links.
Yes!! Exactly httrack is 100% Legal. You can use this software to copy any website for educational purposes. Never use it for scam or cheating, or else you will face legal circumstances.
HTTrack is a free and open-source Web crawler and offline browser, developed by Xavier Roche and licensed under the GNU General Public License Version 3. HTTrack allows users to download World Wide Web sites from the Internet to a local computer.
httrack "http://www.google.com/" -O "./www.google.com" "http://www.google.com/" -v -s0 --depth=1 -n
-n option (or --near) will download images on a webpage no matter where it is located.
Say images are located in google.com/foo/bar/logo.png. as, you are using s0(stay on same directory), it will not download the image unless you specify --near
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With