Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Simulating a remote website locally for testing

I am developing a browser extension. The extension works on external websites we have no control over.

I would like to be able to test the extension. One of the major problems I'm facing is displaying a website 'as-is' locally.

Is it possible to display a website 'as-is' locally?

I want to be able to serve the website exactly as-is locally for testing. This means I want to simulate the exact same HTTP data, including iframe ads, etc.

  • Is there an easy way to do this?

More info:

I'd like my system to act as closely to the remote website as possible. I'd like to run command fetch for example which would allow me to go to the site in my browser (without the internet on) and get the exact same thing I would otherwise (including information that is not from a single domain, google ads, etc).

I don't mind using a virtual machine if this helps.

I figured this was quite a useful thing in testing. Especially when I have a bug I need to reliably reproduce in sites that have many random factors (what ads show, etc).

like image 244
Benjamin Gruenbaum Avatar asked May 05 '13 06:05

Benjamin Gruenbaum


3 Answers

As was already mentioned, caching proxies should do the trick for you (BTW, this is the simplest solution). There are quite a lot of different implementations, so you just need to spend some time selecting a proper one (according to my experience squid is a good solution). Anyway, I would like to highlight two other interesting options:

Option 1: Betamax

Betamax is a tool for mocking external HTTP resources such as web services and REST APIs in your tests. The project was inspired by the VCR library for Ruby. Betamax aims to solve these problems by intercepting HTTP connections initiated by your application and replaying previously recorded responses.

Betamax comes in two flavors. The first is an HTTP and HTTPS proxy that can intercept traffic made in any way that respects Java’s http.proxyHost and http.proxyPort system properties. The second is a simple wrapper for Apache HttpClient.

BTW, Betamax has a very interesting feature for you:

Betamax is a testing tool and not a spec-compliant HTTP proxy. It ignores any and all headers that would normally be used to prevent a proxy caching or storing HTTP traffic.

Option 2: Wireshark and replay proxy

Grab all traffic you are interested in using Wireshark and replay it. This I would say it is not that hard to implement required replaying tool, but you can use available solution called replayproxy

Replayproxy parses HTTP streams from .pcap files opens a TCP socket on port 3128 and listens as a HTTP proxy using the extracted HTTP responses as a cache while refusing all requests for unknown URLs.

Such approach provide you with the full control and bit-to-bit precise simulation.

like image 143
Renat Gilmanov Avatar answered Oct 10 '22 02:10

Renat Gilmanov


I don't know if there is an easy way, but there is a way.

You can set up a local webserver, something like IIS, Apache, or minihttpd.

Then you can grab the website contents using wget. (It has an option for mirroring). And many browsers have an option for "save whole web page" that will grab everything, like images.

Ads will most likely come from remote sites, so you may have to manually edit those lines in the HTML to either not reference the actual ad-servers, or set up a mock ad yourself (like a banner image).

Then you can navigate your browser to http://localhost to visit your local website, assuming port 80 which is the default.

Hope this helps!

like image 44
canhazbits Avatar answered Oct 10 '22 02:10

canhazbits


I assume you want to serve a remote site that's not under your control. In that case you can use a proxy server and have that server cache every response aggressively. However, this has it's limits. First of all you will have to visit every site you intend to use through this proxy (with a browser for example), second you will not be able to emulate form processing.

Alternatively you could use a spider to download all content of a certain website. Depending on the spider software, it may even be able to download JavaScript-built links. You then can use a webserver to serve that content.

like image 29
Janos Pasztor Avatar answered Oct 10 '22 00:10

Janos Pasztor