Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: Search google for a string and return number of hits

Tags:

r

Is there a way in R to simply search google for something and then return the number of results? I have seen alot of R packages around some service of google (RGoogleDocs, RGoogleData, RGoogleMaps, googleVis), but I can't find this feature anywhere.

like image 270
Sacha Epskamp Avatar asked Mar 03 '11 22:03

Sacha Epskamp


2 Answers

This is what I use, but it's based on an API protocol that's eventually being phased out. It's also rate limited, I believe to 100 searches/day. In the below function, service is "web"; you'll need to get a key from http://code.google.com/apis/loader/signup.html (any URL will work).

GetGoogleResults <- function(keyword, service, key) {       
  library(RCurl)
  library(rjson)
  base_url <- "http://ajax.googleapis.com/ajax/services/search/"
  keyword <- gsub(" ", "+", keyword)
  query <- paste(base_url, service, "?v=1.0&q=", keyword, sep="")
  if(!is.null(key))
    query <- paste(query, "&key=", key, sep="")

  query <- paste(query, "&start=", 0, sep="")
  results <- fromJSON(getURL(query))
  return(results)
}

Then, you can do something like

google <- GetGoogleResults("searchTerm", "web", yourkey)

str(google) will tell you the structure of the result. If you just want the number of results, you can use google$responseData$cursor$estimatedResultCount.

As I said, this is based on a protocol that may go out of style some day. Per Dirk's answer, there is an alternate approach using a custom search engine that you can use instead, but it's also rate limited (if you want a function for this method, you can ping me at noah_at_noahhl.com).

The final, and not rate limited, way is just to use RCurl to get a page from google, but it's pretty messy to parse, and requires spoofing a user agent to get around Google's attempts to prevent people from doing this. (I can also share this code, but it gets broken whenever Google tweaks any of their HTML).

like image 195
Noah Avatar answered Sep 25 '22 22:09

Noah


You may want to start at the Google Custom Search API documentation and then see how much JSON you have to learn to hit it :)

There should be enough R infrastructure in place to get something going.

like image 23
Dirk Eddelbuettel Avatar answered Sep 22 '22 22:09

Dirk Eddelbuettel