Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why and how does the googlebot use my website's search engine?

Looking through my search logs from time to time, I notice that by far the biggest user of my search engine is the google-bot. What gives? Is it looking for content that might not be directly accessible through navigation? If so, how does it know which words and phrases to look for (they're surprisingly relevant). Does it check the most popular keywords on the site? I know I seem to be answering my own question here, but this is really only working it out from first principles. I'd like to hear from someone who knows what they're talking about (i.e. not me).

like image 682
Iain Fraser Avatar asked Aug 04 '09 05:08

Iain Fraser


People also ask

Why is Googlebot visiting my site?

Generally speaking, Googlebot behaves like a web browser. It visits your website to find internal and external links, and it fetches the content in order to build an index of your entire website.

What is Googlebot and how does it work?

Googlebot is the generic name for Google's web crawler. Googlebot is the general name for two different types of crawlers: a desktop crawler that simulates a user on desktop, and a mobile crawler that simulates a user on a mobile device.

What is Googlebot and what is it used for How would you block it does Googlebot run JavaScript?

Googlebot reads the robots. txt file. If it marks the URL as disallowed, then Googlebot skips making an HTTP request to this URL and skips the URL. Googlebot then parses the response for other URLs in the href attribute of HTML links and adds the URLs to the crawl queue.

How often does Googlebot visit my site?

Depending on how active your site is, you should expect Google to crawl it anywhere between every four and thirty days. Sites updated more regularly tend to be crawled more often, given Googlebot tends to hunt for new content first.


1 Answers

If your search form's method is get instead of post, each search has its own url, and people might be posting those urls elsewhere. Or if you have a (possibly inadvertently) publicly accessible webstats page that listed those urls, that's another common way for search engines to stumble upon your internal search urls. A third way I've seen is sites that list recent searches on their pages, but this is more intentional. "MySQL Performance Blog" does this to an annoying extent, so any search of their site from google yields hundreds of pages of similar searches, even if none of them found what they were looking for.

Edit: Looks like it does on occasion, but only GET forms: http://googlewebmastercentral.blogspot.com/2008/04/crawling-through-html-forms.html

like image 59
David Avatar answered Nov 15 '22 07:11

David