I scraped a bunch of pages using wget -m -k -E. The resulting files have names in the form foo.php?bar.html. Apache guesses everything after the ? is a query string, is there a way to tell it to ignore the ? as the query string delimiter (and see foo.php?bar.html as the requested file and not foo.php)?
To save you a trip to wget manpage:
-m : mirror recursively
-E : foo.php?bar becomes foo.php?bar.html
-k : convert links in pages (foo.php?bar now links to foo.php?bar.html inside of all the pages so they display properly)
Would escaping the ? as %3F do the trick?
Apache v1 used to handle them, however v2 does not.
I did it using mod_rewrite. Nathans' suggestion in a form of code:
RewriteEngine On
# Convert ? -> %3F in queries and add .html to the end of the filename
RewriteCond %{ENV:REDIRECT_STATUS} !200
RewriteCond %{QUERY_STRING} !^$
RewriteRule ^(.*)$ /$1\%3F%{QUERY_STRING}.html [L,NE]
# An addition for *.php files without question mark in its name, adding html to the end of the filename
RewriteRule ^(.*?)\.php$ $1.php.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With