I used the following code: <pre class="prettyprint"><code>library(XML) library(RCurl) getGoogleURL <- function(search.term, domain = '.co.uk', quotes=TRUE) { search.term <- gsub(' ', '%20', search.term) if(quotes) search.term <- paste('%22', search.term, '%22', sep='') getGoogleURL <- paste('http://www.google', domain, '/search?q=', search.term, sep='') } getGoogleLinks <- function(google.url) { doc <- getURL(google.url, httpheader = c("User-Agent" = "R(2.10.0)")) html <- htmlTreeParse(doc, useInternalNodes = TRUE, error=function(...){}) nodes <- getNodeSet(html, "//a[@href][@class='l']") return(sapply(nodes, function(x) x <- xmlAttrs(x)[[1]])) } search.term <- "cran" quotes <- "FALSE" search.url <- getGoogleURL(search.term=search.term, quotes=quotes) links <- getGoogleLinks(search.url) </code></pre> I would like to find all the links that resulted from my search and I get the following result: <pre class="prettyprint"><code>> links list() </code></pre> How can I get the links? In addition I would like to get the headlines and summary of google results how can I get it? And finally is there a way to get the links that resides in ChillingEffects.org results?

If you look at the <code>html</code>variable, you can see that the search result links all are nested in <code><h3 class="r"></code> tags. Try to change your <code>getGoogleLinks</code> function to: <pre class="prettyprint lang-r prettyprint-override"><code>getGoogleLinks <- function(google.url) { doc <- getURL(google.url, httpheader = c("User-Agent" = "R (2.10.0)")) html <- htmlTreeParse(doc, useInternalNodes = TRUE, error=function (...){}) nodes <- getNodeSet(html, "//h3[@class='r']//a") return(sapply(nodes, function(x) x <- xmlAttrs(x)[["href"]])) } </code></pre>

How to get google search results

Tags:

r

hyperlink

rcurl

I used the following code:

library(XML)
library(RCurl)
getGoogleURL <- function(search.term, domain = '.co.uk', quotes=TRUE) 
    {
    search.term <- gsub(' ', '%20', search.term)
    if(quotes) search.term <- paste('%22', search.term, '%22', sep='') 
        getGoogleURL <- paste('http://www.google', domain, '/search?q=',
        search.term, sep='')
    }

    getGoogleLinks <- function(google.url) 
    {
       doc <- getURL(google.url, httpheader = c("User-Agent" = "R(2.10.0)"))
       html <- htmlTreeParse(doc, useInternalNodes = TRUE, error=function(...){})
       nodes <- getNodeSet(html, "//a[@href][@class='l']")
       return(sapply(nodes, function(x) x <- xmlAttrs(x)[[1]]))
    }

search.term <- "cran"
quotes <- "FALSE"
search.url <- getGoogleURL(search.term=search.term, quotes=quotes)

links <- getGoogleLinks(search.url)

I would like to find all the links that resulted from my search and I get the following result:

> links
list()

How can I get the links? In addition I would like to get the headlines and summary of google results how can I get it? And finally is there a way to get the links that resides in ChillingEffects.org results?

655

asked Oct 01 '15 13:10

Avi

2 Answers

If you look at the htmlvariable, you can see that the search result links all are nested in <h3 class="r"> tags.

Try to change your getGoogleLinks function to:

getGoogleLinks <- function(google.url) {
   doc <- getURL(google.url, httpheader = c("User-Agent" = "R
                                             (2.10.0)"))
   html <- htmlTreeParse(doc, useInternalNodes = TRUE, error=function
                          (...){})
   nodes <- getNodeSet(html, "//h3[@class='r']//a")
   return(sapply(nodes, function(x) x <- xmlAttrs(x)[["href"]]))
}

136

answered Sep 25 '22 22:09

user3794498

I created this function to read in a list of company names and then get the top website result for each. It will get you started then you can adjust it as needed.

#libraries.
library(URLencode)
library(rvest)

#load data
d <-read.csv("P:\\needWebsites.csv")
c <- as.character(d$Company.Name)

# Function for getting website.
getWebsite <- function(name)
{
    url = URLencode(paste0("https://www.google.com/search?q=",name))

    page <- read_html(url)

    results <- page %>% 
      html_nodes("cite") %>% # Get all notes of type cite. You can change this to grab other node types.
      html_text()

    result <- results[1]

    return(as.character(result)) # Return results if you want to see them all.
}

# Apply the function to a list of company names.
websites <- data.frame(Website = sapply(c,getWebsite))]

answered Sep 24 '22 22:09

Bryce Chamberlain

Related questions
                            
                                Interpolate missing values in a time series with a seasonal cycle
                            
                                Converting a Document Term Matrix into a Matrix with lots of data causes overflow
                            
                                Importing an array from matlab into R
                            
                                Pairwise Correlation Table
                            
                                R Programming - Sum Elements of Rows with Common Values
                            
                                Merge dataframes on matching A, B and *closest* C?
                            
                                Plot data over background image with ggplot
                            
                                Convert named vector to dataframe
                            
                                Behavior of summing !is.na() results
                            
                                sum multiple columns by group with tapply
                            
                                R break corpus into sentences
                            
                                Lubridate week() to find consecutive week number for multi-year periods
                            
                                R - Transform Data frame to Time Series [duplicate]
                            
                                Why is R reading UTF-8 header as text?
                            
                                Read shape file with readOGR verses readShapePoly
                            
                                Voronoi diagram polygons enclosed in geographic borders
                            
                                Arrange base plots and grid.tables on the same page
                            
                                RMarkdown ioslides presentation in HD
                            
                                rollmean with dplyr and magrittr
                            
                                Can I reduce pdf file size in knitR/ggplot2 when using a large dataset without using external tools?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With