Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Recover old website off waybackmachine [closed]

Is there a way to recover an entire website from the waybackmachine?

I have an old site that is archived but no longer have the website files to revive it again. Is there a way to recover the old data so I can get my long lost files back?

like image 479
Dustin Avatar asked Mar 16 '12 01:03

Dustin


People also ask

How do you view old websites that no longer exist?

As far as websites go, the Internet Archive stores over 448 billion pages, and you can navigate them using its Wayback Machine tool: To get started, enter the URL of the website you want to check out. The Wayback Machine will show you a graph that tracks how often copies of that website were saved over the years.

Can you restore a website from Wayback Machine?

Using their "Wayback Machine" you can search their archive for a prior version of your site (and pages) which you can then use for rebuilding your page.

How do I get something back on my Wayback Machine?

To remove a site from the Wayback Machine, place a robots. txt file at the top level of your site (e.g. www.yourdomain.com/robots.txt) and then submit your site below.

Does Wayback Machine ever delete?

The Internet Archive / Wayback Machine / Archive.org will only delete pages and sites from when you took ownership, not just because you now have ownership. This is really important. So if you have bought an old domain, you're out of luck for anything older than the day you commenced ownership.


1 Answers

wget is a great tool to mirror an entire site and if you are on windows, you can use Cygwin to install it. The following command will mirror a site: wget -m domain.name

Update from comments:

The example wget command that the wont ascend to the parent dir (-np), ignores robot.txt (-e robots=off), uses the cdn domain (--domains=domain.name), and mirrors a url (the url to mirror, http://an.example.com ). All together you get:

 wget -np -e robots=off --mirror --domains=staticweb.archive.org,web.archive.org http://web.archive.org/web/19970708161549/http://www.google.com/

If you are dealing with https and a self signed cert, u can use --no-check-certificate to disable the certificate check. The wget help is the best place to see possible options.

like image 73
mguymon Avatar answered Oct 13 '22 12:10

mguymon