I am working on a web scraping program to search for specific wines and return a list of local wines of that variety. The problem I am having is multiple page results. The code below is a basic example of what I am working with <pre class="prettyprint"><code>url2 <- "http://www.winemag.com/?s=washington+merlot&search_type=reviews" htmlpage2 <- read_html(url2) names2 <- html_nodes(htmlpage2, ".review-listing .title") Wines2 <- html_text(names2) </code></pre> For this specific search there are 39 pages of results. I know the url changes to http://www.winemag.com/?s=washington%20merlot&drink_type=wine&page=2, but is there an easy way to make the code loop through all the returned pages and compile the results from all 39 pages into a single list? I know I can manually do all the urls, but that seems like overkill.

You can <code>lapply</code> across a vector of the URLs, which you can make by pasting the base URL to a sequence: <pre class="prettyprint"><code>library(rvest) wines <- lapply(paste0('http://www.winemag.com/?s=washington%20merlot&drink_type=wine&page=', 1:39), function(url){ url %>% read_html() %>% html_nodes(".review-listing .title") %>% html_text() }) </code></pre> The result will be returned in a list with an element for each page.

R web scraping across multiple pages

Tags:

html

r

web-scraping

rvest

I am working on a web scraping program to search for specific wines and return a list of local wines of that variety. The problem I am having is multiple page results. The code below is a basic example of what I am working with

url2 <- "http://www.winemag.com/?s=washington+merlot&search_type=reviews"
htmlpage2 <- read_html(url2)
names2 <- html_nodes(htmlpage2, ".review-listing .title")
Wines2 <- html_text(names2)

For this specific search there are 39 pages of results. I know the url changes to http://www.winemag.com/?s=washington%20merlot&drink_type=wine&page=2, but is there an easy way to make the code loop through all the returned pages and compile the results from all 39 pages into a single list? I know I can manually do all the urls, but that seems like overkill.

887

asked Apr 17 '16 23:04

Jamie Leigh

2 Answers

You can do something similar with purrr::map_df() as well if you want all the info as a data.frame:

library(rvest)
library(purrr)

url_base <- "http://www.winemag.com/?s=washington merlot&drink_type=wine&page=%d"

map_df(1:39, function(i) {

  # simple but effective progress indicator
  cat(".")

  pg <- read_html(sprintf(url_base, i))

  data.frame(wine=html_text(html_nodes(pg, ".review-listing .title")),
             excerpt=html_text(html_nodes(pg, "div.excerpt")),
             rating=gsub(" Points", "", html_text(html_nodes(pg, "span.rating"))),
             appellation=html_text(html_nodes(pg, "span.appellation")),
             price=gsub("\\$", "", html_text(html_nodes(pg, "span.price"))),
             stringsAsFactors=FALSE)

}) -> wines

dplyr::glimpse(wines)
## Observations: 1,170
## Variables: 5
## $ wine        (chr) "Charles Smith 2012 Royal City Syrah (Columbia Valley (WA)...
## $ excerpt     (chr) "Green olive, green stem and fresh herb aromas are at the ...
## $ rating      (chr) "96", "95", "94", "93", "93", "93", "93", "93", "93", "93"...
## $ appellation (chr) "Columbia Valley", "Columbia Valley", "Columbia Valley", "...
## $ price       (chr) "140", "70", "70", "20", "70", "40", "135", "50", "60", "3...

186

answered Oct 06 '22 00:10

hrbrmstr

You can lapply across a vector of the URLs, which you can make by pasting the base URL to a sequence:

library(rvest)

wines <- lapply(paste0('http://www.winemag.com/?s=washington%20merlot&drink_type=wine&page=', 1:39),
                function(url){
                    url %>% read_html() %>% 
                        html_nodes(".review-listing .title") %>% 
                        html_text()
                })

The result will be returned in a list with an element for each page.

answered Oct 05 '22 23:10

alistaire

Related questions
                            
                                Image will not fill div
                            
                                JQuery fadeout fading out too fast
                            
                                Getting the table row values with jQuery
                            
                                How to have one label for multiple select boxes?
                            
                                How to stop background image repeating for empty space?
                            
                                Is it good practice to add a php include of the head section in my pages?
                            
                                Remove text-decoration underline, for a:after in css [duplicate]
                            
                                Detect whether postMessage can send objects?
                            
                                Why is jQuery UI accordion open/close animation so choppy?
                            
                                Adding value to input field with jQuery
                            
                                increase font awesome icons
                            
                                Tooltip not working (Bootstrap)
                            
                                {{HTML::image}} set width and height in Laravel
                            
                                save restore local storage to a local file
                            
                                show display:none div after refresh
                            
                                How to get 'div' shaped as a flag with CSS
                            
                                How to keep div in center-height when I resize the browser window?
                            
                                How to vertically align div inside Bootstrap 3 column
                            
                                Jquery is not working properly
                            
                                jQuery, how to find an element by attribute NAME?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With