How to download a full website?

Tags:

After fixing the code of a website to use a CDN (rewriting all the urls to images, js & css), I need to test all the pages on the domain to make sure all the resources are fetched from the CDN.

All the sites pages are accessible through links, no isolated pages.

Currently I'm using FireBug and checking the "Net" view...

Is there some automated way to give a domain name and request all pages + resources of the domain?

Update:

OK, I found I can use wget as so:

wget -p --no-cache -e robots=off -m -H -D cdn.domain.com,www.domain.com -o site1.log www.domain.com

options explained:

-p - download resources too (images, css, js, etc.)
--no-cache - get the real object, do not return server cached object
-e robots=off - disregard robots and no-follow directions
-m - mirror site (follow links)
-H - span hosts (follow other domains too)
-D cdn.domain.com,www.domain.com - specify witch domains to follow, otherwise will follow every link from the page
-o site1.log - log to file site1.log
-U "Mozilla/5.0" - optional: fake the user agent - useful if server returns different data for different browser
www.domain.com - the site to download

Enjoy!

778

asked Oct 23 '12 13:10

SimonW

1 Answers

The wget documentation has this bit in it:

Actually, to download a single page and all its requisites (even if they exist on separate websites), and make sure the lot displays properly locally, this author likes to use a few options in addition to ‘-p’:
      wget -E -H -k -K -p http://site/document

The key is the -H option, which means --span-hosts -> go to foreign hosts when recursive. I don't know if this also stands for normal hyperlinks or only for resources, but you should try it out.

You can consider an alternate strategy. You don't need to download the resources to test that they are referenced from the CDN. You can just get the source code for the pages you're interested in (you can use wget, as you did, or curl, or something else) and either:

parse it using a library - which one depends on the language you're using for scripting. Check each <img />, <link /> and <script /> for CDN links.
use regexes to check that the resource urls contain the CDN domain. See this :), although in this limited case it might not be overly complicated.

You should also check all CSS files for url() links - they should also point to CDN images. Depending on the logic of your apllication, you may need to check that the JavaScript code does not create any images that do not come from the CDN.

200

answered Sep 20 '22 11:09

Alex Ciminian

Related questions
                            
                                Why ActiveRecord::Base.connected? is false, after calling establish_connection
                            
                                TDD/Testing CSS and HTML? [closed]
                            
                                React test; how to mock componentDidMount or overwrite it?
                            
                                How does Cobertura work with JUnit?
                            
                                how to test the result in goroutine without wait in test
                            
                                Jest: Cannot spy the property because it is not a function; undefined given instead
                            
                                How do I use a feature of a dependency only for testing?
                            
                                Laravel 5.1 PHPUnit - press() gives me 'Unreachable field ""'
                            
                                Best Way to Test Rails REST XML API?
                            
                                InstrumentationTestRunner: What does android:functionalTest achieve?
                            
                                How To Make a "Corrupt" File [closed]
                            
                                django RequestFactory file upload
                            
                                Automation Testing web services [closed]
                            
                                No test coverage when tests are in a different package
                            
                                Disabling AVX2 in CPU for testing purposes
                            
                                How do you run Android instrumentation tests from Eclipse?
                            
                                Testing QuickCheck properties against multiple types?
                            
                                How to extend WPF hit testing zone for a Path object
                            
                                How to select a container by qualifyer from arquillian.xml?
                            
                                VerifyError in android test-project build tools v17

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to download a full website?

Tags:

testing

wget

automation

qa

web-testing

SimonW

People also ask

1 Answers

Alex Ciminian

Recent Activity

Donate For Us