After fixing the code of a website to use a CDN (rewriting all the urls to images, js & css), I need to test all the pages on the domain to make sure all the resources are fetched from the CDN.
All the sites pages are accessible through links, no isolated pages.
Currently I'm using FireBug and checking the "Net" view...
Is there some automated way to give a domain name and request all pages + resources of the domain?
Update:
OK, I found I can use wget
as so:
wget -p --no-cache -e robots=off -m -H -D cdn.domain.com,www.domain.com -o site1.log www.domain.com
options explained:
-p
- download resources too (images, css, js, etc.)--no-cache
- get the real object, do not return server cached object-e robots=off
- disregard robots
and no-follow
directions-m
- mirror site (follow links)-H
- span hosts (follow other domains too)-D cdn.domain.com,www.domain.com
- specify witch domains to follow, otherwise will follow every link from the page-o site1.log
- log to file site1.log-U "Mozilla/5.0"
- optional: fake the user agent - useful if server returns different data for different browserwww.domain.com
- the site to downloadEnjoy!
HTTrack is a free (GPL, libre/free software) and easy-to-use offline browser utility. It allows you to download a World Wide Web site from the Internet to a local directory, building recursively all directories, getting HTML, images, and other files from the server to your computer.
The wget
documentation has this bit in it:
Actually, to download a single page and all its requisites (even if they exist on separate websites), and make sure the lot displays properly locally, this author likes to use a few options in addition to ‘-p’:
wget -E -H -k -K -p http://site/document
The key is the -H
option, which means --span-hosts -> go to foreign hosts when recursive
. I don't know if this also stands for normal hyperlinks or only for resources, but you should try it out.
You can consider an alternate strategy. You don't need to download the resources to test that they are referenced from the CDN. You can just get the source code for the pages you're interested in (you can use wget
, as you did, or curl
, or something else) and either:
<img />
, <link />
and <script />
for CDN links. You should also check all CSS files for url()
links - they should also point to CDN images. Depending on the logic of your apllication, you may need to check that the JavaScript code does not create any images that do not come from the CDN.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With