Can I use WGET to generate a sitemap of a website given its URL?

Question

I need a script that can spider a website and return the list of all crawled pages in plain-text or similar format; which I will submit to search engines as sitemap. Can I use WGET to generate a sitemap of a website? Or is there a PHP script that can do the same?

Salman A · Accepted Answer

wget --spider --recursive --no-verbose --output-file=wgetlog.txt http://somewebsite.com
sed -n "s@.\+ URL:$[^ ]\+$ .\+@\1@p" wgetlog.txt | sed "s@&@\&amp;@" > sedlog.txt

This creates a file called sedlog.txt that contains all links found on the specified website. You can use PHP or a shell script to convert the text file sitemap into an XML sitemap. Tweak the parameters of the wget command (accept/reject/include/exclude) to get only the links you need.

Gilles Quenot · Answer

You can use this perl script to do the trick : http://code.google.com/p/perlsitemapgenerator/

Can I use WGET to generate a sitemap of a website given its URL?

Tags:

php

wget

bots

web-crawler

Salman A

2 Answers

Salman A

Gilles Quenot

Recent Activity

Donate For Us

Can I use WGET to generate a sitemap of a website given its URL?

Tags:

php

wget

bots

web-crawler

Salman A

2 Answers

Salman A

Gilles Quenot

Related questions

Recent Activity

Donate For Us