I have a script for downloading all of my Chrome Bookmarks. I use wget with the --html-extension because some of the bookmarks end in .php and can't be opened by a web browser unless --html-extension option is used. The problem I am having is that when I use --html-extension with --no-clobber, It doesn't recognize that most of the files are already there for some reason, so it goes through the whole process of redownloading stuff it already has.
An example: wget -nc http://www.test.com/ run once will save the file like it is supposed to. if you run it again then it will say the file already there so not retrieving. that is the operation i would expect.
however, delete the file that was just saved and run: wget -nc http://www.test.com/ --html-extension and then run that same command again. it overwrites the file instead of saying file already there. What is going on?
When the html suffix is added, wget can't tell what remote file you want to compare it to.
man wget: http://unixhelp.ed.ac.uk/CGI/man-cgi?wget
======================
--html-extension
If a file of type application/xhtml+xml or text/html is downloaded and the URL does not end with the regexp .[Hh][Tt][Mm][Ll]?, this option will cause the suffix .html to be appended to the local filename. This is useful, for instance, when you're mirroring a remote site that uses .asp pages, but you want the mirrored pages to be viewable on your stock Apache server. Another good use for this is when you're downloading CGI-generated materials. A URL like http://site.com/article.cgi?25 will be saved as arti- cle.cgi?25.html.
Note that filenames changed in this way will be re-downloaded every time you re-mirror a site, because Wget can't tell that the local X.html file corresponds to remote URL X (since it doesn't yet know that the URL produces output of type text/html or application/xhtml+xml. To prevent this re-downloading, you must use -k and -K so that the original version of the file will be saved as X.orig.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With