wget appends query string to resulting file

Question

I'm trying to retrieve working webpages with wget and this goes well for most sites with the following command:

wget -p -k http://www.example.com

In these cases I will end up with index.html and the needed CSS/JS etc.

HOWEVER, in certain situations the url will have a query string and in those cases I get an index.html with the query string appended.

Example

www.onlinetechvision.com/?p=566

Combined with the above wget command will result in:

index.html?page=566

I have tried using the --restrict-file-names=windows option, but that only gets me to

index.html@page=566

Can anyone explain why this is needed and how I can end up with a regular index.html file?

UPDATE: I'm sort of on the fence on taking a different approach. I found out I can take the first filename that wget saves by parsing the output. So the name that appears after Saving to: is the one I need.

However, this is wrapped by this strange character â - rather than just removing that hardcoded - where does this come from?

TadejP · Accepted Answer

If you try with parameter "--adjust-extension"

wget -p -k --adjust-extension  www.onlinetechvision.com/?p=566

you come closer. In www.onlinetechvision.com folder there will be file with corrected extension: index.html@p=566.html or index.html?p=566.html on *NiX systems. It is simple now to change that file to index.html even with script.

If you are on Microsoft OS make sure you have latter version of wget - it is also available here: https://eternallybored.org/misc/wget/

Tim Pierce · Answer

To answer your question about why this is needed, remember that the web server is likely to return different results based on the parameters in the query string. If a query for index.html?page=52 returns different results from index.html?page=53, you probably wouldn't want both pages to be saved in the same file.

Each HTTP request that uses a different set of query parameters is quite literally a request for a distinct resource. wget can't predict which of these changes is and isn't going to be significant, so it's doing the conservative thing and preserving the query parameter URLs in the filename of the local document.

wget appends query string to resulting file

Tags:

user1914292

2 Answers

TadejP

Tim Pierce

Recent Activity

Donate For Us

wget appends query string to resulting file

Tags:

user1914292

2 Answers

TadejP

Tim Pierce

Related questions

Recent Activity

Donate For Us