Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

wget download and rename files that originally have no file extension

Tags:

wget

Have a wget download I'm trying to perform.

It downloads several thousand files, unless I start to restrict the file type (junk files etc). In theory restricting the file type is fine.

However there are lots of files that wget downloads without a file extension, that when manually opened with Adobe for example, are actually PDF's. These are actually the files I want.

Restricting the wget to filetype PDF does not download these files.

So far my syntax is wget -r --no-parent A.pdf www.websitehere.com

Using wget -r --no-parent www.websitehere.com brings me every file type, so in theory I have everything. But this means I have 1000's of junk files to remove, and then several hundred of the useful files of unknown file type to rename.

Any ideas on how to wget and save the files with the appropriate file extension?

Alternatively, a way restrict the wget to only files without a file extension, and then a separate batch method to determine the file type and rename appropriately?

Manually testing every file to determine the appropriate application will take a lot of time.

Appreciate any help!

like image 657
Stews Avatar asked Jul 23 '13 01:07

Stews


1 Answers

wget has an --adjust-extension option, which will add the correct extensions to HTML and CSS files. Other files (like PDFs) may not work, though. See the complete documentation here.

like image 80
Arman H Avatar answered Nov 12 '22 05:11

Arman H