Is it possible to scrape the Google search results page using PHP to pull out the total number of search results found?
If so how would I go about doing this?
Thanks
try this using phpsimplehtmlparser
$search_query = 'google';
$url = sprintf('http://www.google.com/search?q=%s', $search_query);
$html = file_get_html($url);
$results = $html->find('#resultStats/b', 2)->innertext;
echo sprintf('Google found %s results for "%s"', $results, $search_query);
This PHP class does it: http://www.phpclasses.org/browse/package/3924.html
"This class can be used to get the total number of results for given Google search query.
It accesses the Google search site to perform a query for given search terms.
The class parses the results page and extract the total number of results that the given search query returned."
You'll need a bunch of proxies depending on the number of requests you plan to send. You can send about 500 requests per day and IP/proxy without causing trouble or getting detected.
You should read up at the google-rank-checker.squabbel.com article, it contains a full featured scraper in PHP. Use that scraper, modify it to your requirements and add the code of phpsimplehtmlparser (the other answer) to get the total-hits information for the keywords.
I suggest the use of libCURL for accessing Google itself. You will have a LOT more options than using a more simple API, you'll not have much fun with file_get_html() or similar php internal functions as Google would block your script very soon.
Something like this:
curl_setopt ($ch, CURLOPT_HEADER, 0);
curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER , 1);
$curl_proxy = "$IP:$PORT";
curl_setopt($ch, CURLOPT_PROXY, $curl_proxy);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 20);
curl_setopt($ch, CURLOPT_TIMEOUT, 20);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.0; en; rv:1.9.0.4) Gecko/2009011913 Firefox/3.0.6");
$url = sprintf('http://www.google.com/search?q=%s', $keyword);
curl_setopt ($ch, CURLOPT_URL, $url);
$htmdata = curl_exec ($ch);
Now just use regex()/substr()/strstr() to grab the data from $htmldata
I AM using this php script to find out the total results of my name in google's search .
<?php
$homepage = file_get_contents('http://www.google.co.in/search?ix=nh&sourceid=chrome&ie=UTF-8&q=Mohit+dabas');
preg_match('/(About )?([\d,]+) result/si', $homepage, $p) ;
echo $p[0];
?>
the main thing to be noticed is '&q' parameter in the path defined above
My name contain space so the browser added '+' to it
So you should check ur query(i.e &q) parameters and if ur query contain some special
character like ., : ,% etc then you should note how they are treated by the browser and
change the parameters acc. to your need int script.
srry for the poor english
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With