I want to use html_nodes
to scrape organizations' names from the google search results (I need the first element only, assuming that that's gonna be the best guess).
Right now, I am trying to target the first result using its xpath, and passing it to the function html_nodes
.
To find the xpath, I am using google chrome as in the pic below
Which gives me //*[@id="rso"]/div[1]/div/div[1]/div/div/h3/a
as an xpath for the title of the first result. However, when I try to pass it to html_nodes()
I get an empty string:
page %>% html_nodes(xpath='//*[@id="rso"]/div[1]/div/div[1]/div/div/h3/a')
{xml_nodeset (0)}
While I would expect the string The A-Test 2017 Workshop
.
How can I get the content of that a
tag either with xpath or css?
When scraping websites, selectorgadget is a great tool. Using this I could determine that with google search results, all headings can be found with the following css-tag: .r
.
To scrape the results you could therefore use something like this:
library(rvest)
# searching for `rstudio`
page <- read_html("https://www.google.at/search?client=safari&rls=en&q=rstudio&ie=UTF-8&oe=UTF-8&gfe_rd=cr&ei=VpJsWe2oOqqk8wfT5KaQDQ")
page %>%
html_nodes(".r") %>%
html_text()
#> [1] "RStudio – Open source and enterprise-ready professional software ..."
#> [2] "Download"
#> [3] "Download RStudio Server"
#> [4] "RStudio Server"
#> [5] "Shiny"
#> [6] "RStudio – Wikipedia"
#> [7] "RStudio - Wikipedia"
#> [8] "Datenrettung | R-Studio 8.3 Deutsch | Software zur Datenrettung ..."
#> [9] "GitHub - rstudio/rstudio: RStudio is an integrated development ..."
#> [10] "RStudio · GitHub"
#> [11] "R-Studio"
#> [12] "Install RStudio with R Server on HDInsight - Azure | Microsoft Docs"
You can easily find the first one with subsetting:
page %>%
html_nodes(".r") %>%
html_text() %>%
.[1]
#> [1] "RStudio – Open source and enterprise-ready professional software ..."
This blog demonstrates the approach more thoroughly: https://blog.rstudio.com/2014/11/24/rvest-easy-web-scraping-with-r/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With