Looking through my search logs from time to time, I notice that by far the biggest user of my search engine is the google-bot. What gives? Is it looking for content that might not be directly accessible through navigation? If so, how does it know which words and phrases to look for (they're surprisingly relevant). Does it check the most popular keywords on the site? I know I seem to be answering my own question here, but this is really only working it out from first principles. I'd like to hear from someone who knows what they're talking about (i.e. not me).
Generally speaking, Googlebot behaves like a web browser. It visits your website to find internal and external links, and it fetches the content in order to build an index of your entire website.
Googlebot is the generic name for Google's web crawler. Googlebot is the general name for two different types of crawlers: a desktop crawler that simulates a user on desktop, and a mobile crawler that simulates a user on a mobile device.
Googlebot reads the robots. txt file. If it marks the URL as disallowed, then Googlebot skips making an HTTP request to this URL and skips the URL. Googlebot then parses the response for other URLs in the href attribute of HTML links and adds the URLs to the crawl queue.
Depending on how active your site is, you should expect Google to crawl it anywhere between every four and thirty days. Sites updated more regularly tend to be crawled more often, given Googlebot tends to hunt for new content first.
If your search form's method is get instead of post, each search has its own url, and people might be posting those urls elsewhere. Or if you have a (possibly inadvertently) publicly accessible webstats page that listed those urls, that's another common way for search engines to stumble upon your internal search urls. A third way I've seen is sites that list recent searches on their pages, but this is more intentional. "MySQL Performance Blog" does this to an annoying extent, so any search of their site from google yields hundreds of pages of similar searches, even if none of them found what they were looking for.
Edit: Looks like it does on occasion, but only GET forms: http://googlewebmastercentral.blogspot.com/2008/04/crawling-through-html-forms.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With