Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Google search: Scrape results page in PHP for total results

Tags:

php

Is it possible to scrape the Google search results page using PHP to pull out the total number of search results found?

If so how would I go about doing this?

Thanks

like image 220
Probocop Avatar asked Apr 01 '10 13:04

Probocop


4 Answers

try this using phpsimplehtmlparser

$search_query = 'google';
$url = sprintf('http://www.google.com/search?q=%s', $search_query);
$html = file_get_html($url);
$results = $html->find('#resultStats/b', 2)->innertext;

echo sprintf('Google found %s results for "%s"', $results, $search_query);
like image 163
vooD Avatar answered Sep 30 '22 02:09

vooD


This PHP class does it: http://www.phpclasses.org/browse/package/3924.html

"This class can be used to get the total number of results for given Google search query.

It accesses the Google search site to perform a query for given search terms.

The class parses the results page and extract the total number of results that the given search query returned."

like image 26
rogeriopvl Avatar answered Sep 30 '22 00:09

rogeriopvl


You'll need a bunch of proxies depending on the number of requests you plan to send. You can send about 500 requests per day and IP/proxy without causing trouble or getting detected.

You should read up at the google-rank-checker.squabbel.com article, it contains a full featured scraper in PHP. Use that scraper, modify it to your requirements and add the code of phpsimplehtmlparser (the other answer) to get the total-hits information for the keywords.

I suggest the use of libCURL for accessing Google itself. You will have a LOT more options than using a more simple API, you'll not have much fun with file_get_html() or similar php internal functions as Google would block your script very soon.

Something like this:

  curl_setopt ($ch, CURLOPT_HEADER, 0);
  curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1);
  curl_setopt ($ch, CURLOPT_RETURNTRANSFER , 1);
  $curl_proxy = "$IP:$PORT";
  curl_setopt($ch, CURLOPT_PROXY, $curl_proxy);
  curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 20);
  curl_setopt($ch, CURLOPT_TIMEOUT, 20);
  curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.0; en; rv:1.9.0.4) Gecko/2009011913 Firefox/3.0.6");
  $url = sprintf('http://www.google.com/search?q=%s', $keyword);
  curl_setopt ($ch, CURLOPT_URL, $url);
  $htmdata = curl_exec ($ch);

Now just use regex()/substr()/strstr() to grab the data from $htmldata

like image 44
John Avatar answered Sep 30 '22 00:09

John


I AM using this php script to find out the total results of my name in google's search .

<?php
$homepage = file_get_contents('http://www.google.co.in/search?ix=nh&sourceid=chrome&ie=UTF-8&q=Mohit+dabas');
preg_match('/(About )?([\d,]+) result/si', $homepage, $p) ;
echo $p[0];
?>

the main thing to be noticed is '&q' parameter in the path defined above

My name contain space so the browser added '+' to it

So you should check ur query(i.e &q) parameters and if ur query contain some special

character like ., : ,% etc then you should note how they are treated by the browser and

change the parameters acc. to your need int script.

srry for the poor english

like image 21
Mohit Dabas Avatar answered Sep 30 '22 00:09

Mohit Dabas