This is my code: <pre class="prettyprint"><code>library(rvest) library(XML) library(xml2) url_imb <- 'https://www.imdb.com/search/title/?count=100&release_date=2016,2016&title_type=feature' web_page<-read_html(url_imb) </code></pre> I want to extract all Directors names related to <code>adv_li_dr_0</code>tags. This is what I did: CSS SELECTOR: <pre class="prettyprint"><code>directors_0<-html_text(html_nodes(web_page,"p a")) </code></pre> XPATH SELECTOR: <pre class="prettyprint"><code>directors_0<-html_attr(html_nodes(web_page,xpath='//p[@class=""]//a'),"href") </code></pre> It is incomplete of course. But can you help me? How to extract elemnts related to a tag in <code>href</code>.

Is this what you want? <pre class="prettyprint"><code>library(rvest) library(XML) library(xml2) url_imb <- 'https://www.imdb.com/search/title/?count=100&release_date=2016,2016&title_type=feature' directors <- read_html(url_imb) %>% html_nodes(xpath = "//p[contains(text(),'Director')]/a[contains(@href, '_dr')]") %>% html_text() </code></pre>

WebScraping in R: extract names from `href` tags

Tags:

r

web-scraping

This is my code:

library(rvest)
library(XML)
library(xml2)
url_imb <- 'https://www.imdb.com/search/title/?count=100&release_date=2016,2016&title_type=feature'
web_page<-read_html(url_imb)

I want to extract all Directors names related to adv_li_dr_0tags.

This is what I did: CSS SELECTOR:

directors_0<-html_text(html_nodes(web_page,"p a"))

XPATH SELECTOR:

directors_0<-html_attr(html_nodes(web_page,xpath='//p[@class=""]//a'),"href")

It is incomplete of course. But can you help me? How to extract elemnts related to a tag in href.

762

asked Sep 12 '19 15:09

Laura

2 Answers

Is this what you want?

library(rvest)
library(XML)
library(xml2)
url_imb <- 'https://www.imdb.com/search/title/?count=100&release_date=2016,2016&title_type=feature'
directors <- read_html(url_imb) %>% 
  html_nodes(xpath = "//p[contains(text(),'Director')]/a[contains(@href, '_dr')]") %>% 
  html_text()

answered Oct 12 '22 23:10

Mislav

I would consider using a css attribute = value selector with contains operator to specify the href attribute must contain the substring adv_li_dr_ . Note I have dropped the 0 on the assumption you want all directors. If you want only the first director for each film then put the 0 in on the end. Note this should be faster and less fragile than xpath.

library(rvest)
library(magrittr)

url_imb <- 'https://www.imdb.com/search/title/?count=100&release_date=2016,2016&title_type=feature'
directors <-read_html(url_imb) %>% html_nodes('[href*=adv_li_dr_]')%>%html_text()

Reading:

Attribute selectors.

answered Oct 12 '22 21:10

QHarr

Related questions
                            
                                Adding confidence intervals to plot from simulation data in R
                            
                                R or another robust Ruby math lib on Heroku?
                            
                                Washy legend ggplot2
                            
                                Join with a line the results from stat_summary()
                            
                                regarding integrating R into Web-applications
                            
                                scatter plot for sorted data in R
                            
                                Exporting dendrogram as table in R
                            
                                Perform nonnegative matrix factorization in R
                            
                                How to plot quadrat counts on top of a map in ggplot2 in a heatmap-like style
                            
                                Passing stats to geoms with geom_violin in ggplot2
                            
                                Performing calculations by subsets of data in R
                            
                                R intersection of lists
                            
                                Why is inverting a positive definite matrix via Cholesky decomposition slower than regular inversion with numpy?
                            
                                I'm using set.seed() but getting different answers in each run [closed]
                            
                                How to include R6 objects to share data across modules in golem Shiny app
                            
                                Arrange ggplot facets in the shape of the US
                            
                                Is there a Python equivalent to R's sample() function?
                            
                                Why does foreach %dopar% get slower with each additional node?
                            
                                ggplot2 line plot order
                            
                                What are the differences between R's new native pipe `|>` and the magrittr pipe `%>%`?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With