Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get google search results

Tags:

r

hyperlink

rcurl

I used the following code:

library(XML)
library(RCurl)
getGoogleURL <- function(search.term, domain = '.co.uk', quotes=TRUE) 
    {
    search.term <- gsub(' ', '%20', search.term)
    if(quotes) search.term <- paste('%22', search.term, '%22', sep='') 
        getGoogleURL <- paste('http://www.google', domain, '/search?q=',
        search.term, sep='')
    }

    getGoogleLinks <- function(google.url) 
    {
       doc <- getURL(google.url, httpheader = c("User-Agent" = "R(2.10.0)"))
       html <- htmlTreeParse(doc, useInternalNodes = TRUE, error=function(...){})
       nodes <- getNodeSet(html, "//a[@href][@class='l']")
       return(sapply(nodes, function(x) x <- xmlAttrs(x)[[1]]))
    }

search.term <- "cran"
quotes <- "FALSE"
search.url <- getGoogleURL(search.term=search.term, quotes=quotes)

links <- getGoogleLinks(search.url)

I would like to find all the links that resulted from my search and I get the following result:

> links
list()

How can I get the links? In addition I would like to get the headlines and summary of google results how can I get it? And finally is there a way to get the links that resides in ChillingEffects.org results?

like image 655
Avi Avatar asked Oct 01 '15 13:10

Avi


People also ask

Why is Google not showing my search results?

Restart your device and try your search again. If you're able to connect to the Internet, update the Google app to the latest version. To check if you get results, try your search again. When you clear an app's cache, you delete data stored in a temporary area of the device's memory.

How do I get a specific search result?

1. Exact phrase. The simplest and most effective way to search for something specific is to use quote marks around a phrase or name to search for those exact words in that exact order. For instance, searching for Joe Bloggs will show results with both Joe and Bloggs but not necessarily placed sequentially.


2 Answers

If you look at the htmlvariable, you can see that the search result links all are nested in <h3 class="r"> tags.

Try to change your getGoogleLinks function to:

getGoogleLinks <- function(google.url) {
   doc <- getURL(google.url, httpheader = c("User-Agent" = "R
                                             (2.10.0)"))
   html <- htmlTreeParse(doc, useInternalNodes = TRUE, error=function
                          (...){})
   nodes <- getNodeSet(html, "//h3[@class='r']//a")
   return(sapply(nodes, function(x) x <- xmlAttrs(x)[["href"]]))
}
like image 136
user3794498 Avatar answered Sep 25 '22 22:09

user3794498


I created this function to read in a list of company names and then get the top website result for each. It will get you started then you can adjust it as needed.

#libraries.
library(URLencode)
library(rvest)

#load data
d <-read.csv("P:\\needWebsites.csv")
c <- as.character(d$Company.Name)

# Function for getting website.
getWebsite <- function(name)
{
    url = URLencode(paste0("https://www.google.com/search?q=",name))

    page <- read_html(url)

    results <- page %>% 
      html_nodes("cite") %>% # Get all notes of type cite. You can change this to grab other node types.
      html_text()

    result <- results[1]

    return(as.character(result)) # Return results if you want to see them all.
}

# Apply the function to a list of company names.
websites <- data.frame(Website = sapply(c,getWebsite))]
like image 23
Bryce Chamberlain Avatar answered Sep 24 '22 22:09

Bryce Chamberlain