wget download and rename files that originally have no file extension

Question

Have a wget download I'm trying to perform.

It downloads several thousand files, unless I start to restrict the file type (junk files etc). In theory restricting the file type is fine.

However there are lots of files that wget downloads without a file extension, that when manually opened with Adobe for example, are actually PDF's. These are actually the files I want.

Restricting the wget to filetype PDF does not download these files.

So far my syntax is wget -r --no-parent A.pdf www.websitehere.com

Using wget -r --no-parent www.websitehere.com brings me every file type, so in theory I have everything. But this means I have 1000's of junk files to remove, and then several hundred of the useful files of unknown file type to rename.

Any ideas on how to wget and save the files with the appropriate file extension?

Alternatively, a way restrict the wget to only files without a file extension, and then a separate batch method to determine the file type and rename appropriately?

Manually testing every file to determine the appropriate application will take a lot of time.

Appreciate any help!

Arman H · Accepted Answer

wget has an --adjust-extension option, which will add the correct extensions to HTML and CSS files. Other files (like PDFs) may not work, though. See the complete documentation here.

wget download and rename files that originally have no file extension

Tags:

wget

Stews

1 Answers

Arman H

Recent Activity

Donate For Us

wget download and rename files that originally have no file extension

Tags:

wget

Stews

1 Answers

Arman H

Related questions

Recent Activity

Donate For Us