I am trying to scrape data from Yelp. One step is to extract links from each restaurant. For example, I search restaurants in NYC and get some results. Then I want to extract the links of all the 10 restaurants Yelp recommends on page 1. Here is what I have tried: <pre class="prettyprint"><code>library(rvest) page=read_html("http://www.yelp.com/search?find_loc=New+York,+NY,+USA") page %>% html_nodes(".biz-name span") %>% html_attr('href') </code></pre> But the code always returns 'NA'. Can anyone help me with that? Thanks!

<pre class="prettyprint"><code>library(rvest) page <- read_html("http://www.yelp.com/search?find_loc=New+York,+NY,+USA") page %>% html_nodes(".biz-name") %>% html_attr('href') </code></pre> Hope this would simplify your problem

I also was able to clean the results from above which for me were quite noisy <code>links <- page %>% html_nodes("a") %>% html_attr("href")</code> with a simple regex string matching <code>links <- links[which(regexpr('common-url-element', links) >= 1)]</code>.

Using 'rvest' to extract links

Tags:

I am trying to scrape data from Yelp. One step is to extract links from each restaurant. For example, I search restaurants in NYC and get some results. Then I want to extract the links of all the 10 restaurants Yelp recommends on page 1. Here is what I have tried:

Click to copy

library(rvest)     
page=read_html("http://www.yelp.com/search?find_loc=New+York,+NY,+USA")
page %>% html_nodes(".biz-name span") %>% html_attr('href')

But the code always returns 'NA'. Can anyone help me with that? Thanks!

671

asked Feb 06 '16 22:02

Allen

2 Answers

Click to copy

library(rvest)     
page <- read_html("http://www.yelp.com/search?find_loc=New+York,+NY,+USA")
page %>% html_nodes(".biz-name") %>% html_attr('href')

Hope this would simplify your problem

113

answered Sep 17 '22 15:09

Bharath

I also was able to clean the results from above which for me were quite noisy

links <- page %>% html_nodes("a") %>% html_attr("href")

with a simple regex string matching

links <- links[which(regexpr('common-url-element', links) >= 1)].

answered Sep 19 '22 15:09

oliver

Related questions
                            
                                find submit button in selenium without id
                            
                                GCC generates redundant code for repeated XOR of an array element
                            
                                Popup UIViewController
                            
                                how to play/pause video in React without external library?
                            
                                MongoDB data/db not found
                            
                                How to set the scale when using UIGraphicsImageRenderer
                            
                                Convert Excel style date with pandas
                            
                                specify max log json file size in docker compose
                            
                                Does "&" vs. "&&" actually make a difference for compile-time flags?
                            
                                Spring Boot JPA Query for not null
                            
                                What is google app engine sdk directory in windows?
                            
                                Returning a Dictionary<string, string> from a linq query

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With