How Search Engine Bots Crawl Forums?

Tags:

If I have a forums site with a large number of threads, will the search engine bot crawl the whole site every time? Say I have over 1,000,000 threads in my site, will they get crawled every time the bot crawls my site? or how does it work? I want my website to be indexed but I don't want the bot to kill my website! In other words I don't want the bot to keep crawling the old threads again and again every time it crawls my website.

Also, what about the pages crawled before? Will the bot request them every time it crawls my website to make sure they are still on the site? I'm asking this because I only link to the latest threads, i.e. there's a page that contains a list of all the latest threads, but I don't link to the older threads, they have to be explicitly requested by URL, e.g. http://example.com/showthread.aspx?threadid=7, will this work to stop the bot from bringing my site down and consuming all my bandwidth?

P.S. The site is still under development but I want to know in order to design the site so that search engine bots don't bring it down.

956

asked Nov 07 '08 07:11

Waleed Eissa

1 Answers

Complicated stuff.

From my experience, it depends more on what URL scheme do you use to link pages together that will determine if the crawler will crawls which pages.

Most engine crawl the entire website, if it is all properly hyperlinked with a crawl-friendly URLs e.g. use URL rewriting instead of a topicID=123 querystrings and that all pages are easily linkable a few clicks from the main page.
Another case is paging, if you have paging sometimes the bot crawl just the first page and stops when it finds the next-page link keeps hitting the same document e.g. one index.php for the entire website.
You wouldn't want a bot to accidently hit some webpage that perform certain actions e.g. a "Delete topic" link that links to "delete.php?topicID=123" so most crawlers will check for those cases as well.
The Tools page at SEOmoz also provide a lot of information and insight about the way some crawlers work and what information it will extract and chew etc. You could use those to determine wether the pages deep inside your forum e.g. a year-old post might gets crawled or not.
And some crawlers enable you to customize their crawling behavior... something like Google Sitemaps. You could tell them to do-crawl and don't-crawl which pages and on which order etc. I remember there are such services available from MSN and Yahoo as well but have never tried it out myself.
You can throttle the crawling bot so it doesn't overwhelm your website by providing a robots.txt file in the website root.

Basically, if you design your forum so that the URLs doesn't look hostile to the crawlers, it'll merrily crawls the entire website.

174

answered Oct 10 '22 08:10

chakrit

Related questions
                            
                                Django i18n and SEO
                            
                                JavaScript links and SEO?
                            
                                Ruby code to check if a website has search engine friendly URLs
                            
                                How to open google ad on new tab?
                            
                                Open graph description meta tag is not working in LinkedIn
                            
                                Google Plus One Button Speed. Load on mouseover
                            
                                Will Url rewrite with htaccess make website discovered by search engine?
                            
                                Itemscope and itemprop at same level
                            
                                prerender.io IIS configuration
                            
                                How to redirect crawlers requests to pre-rendered pages when using Amazon S3?
                            
                                Are AJAX sites crawlable by search engines?
                            
                                Does the CSS property "text-transform" affect SEO results?
                            
                                Using mod_rewrite to hide .php from the end of URLs
                            
                                Text-Indent vs Position for SEO
                            
                                Canonical tags and UTF8
                            
                                Do the positives of using jQuery to load content outweigh the seo negatives?
                            
                                When should I use HTML5 Microdata for SEO?
                            
                                StackOverflowError in servlet mapping with url-pattern "/*"
                            
                                does breaking a word with a span tag effect seo
                            
                                Specify custom canonical URL in WordPress Post

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How Search Engine Bots Crawl Forums?

Tags:

seo

search-engine

Waleed Eissa

People also ask

1 Answers

chakrit

Recent Activity

Donate For Us