This is my code:
library(rvest)
library(XML)
library(xml2)
url_imb <- 'https://www.imdb.com/search/title/?count=100&release_date=2016,2016&title_type=feature'
web_page<-read_html(url_imb)
I want to extract all Directors names related to adv_li_dr_0
tags.
This is what I did: CSS SELECTOR:
directors_0<-html_text(html_nodes(web_page,"p a"))
XPATH SELECTOR:
directors_0<-html_attr(html_nodes(web_page,xpath='//p[@class=""]//a'),"href")
It is incomplete of course. But can you help me? How to extract elemnts related to a tag in href
.
To extract text from a webpage of interest, we specify what HTML elements we want to select by using html_nodes() . For instance, if we want to scrape the primary heading for the Web Scraping Wikipedia webpage we simply identify the <h1> node as the node we want to select.
The read_html command creates an R object, basically a list, that stores information about the web page.
In general, web scraping in R (or in any other language) boils down to the following three steps: Get the HTML for the web page that you want to scrape. Decide what part of the page you want to read and find out what HTML/CSS you need to select it. Select the HTML and analyze it in the way you need.
Is this what you want?
library(rvest)
library(XML)
library(xml2)
url_imb <- 'https://www.imdb.com/search/title/?count=100&release_date=2016,2016&title_type=feature'
directors <- read_html(url_imb) %>%
html_nodes(xpath = "//p[contains(text(),'Director')]/a[contains(@href, '_dr')]") %>%
html_text()
I would consider using a css attribute = value selector with contains operator to specify the href attribute must contain the substring adv_li_dr_
. Note I have dropped the 0 on the assumption you want all directors. If you want only the first director for each film then put the 0 in on the end. Note this should be faster and less fragile than xpath.
library(rvest)
library(magrittr)
url_imb <- 'https://www.imdb.com/search/title/?count=100&release_date=2016,2016&title_type=feature'
directors <-read_html(url_imb) %>% html_nodes('[href*=adv_li_dr_]')%>%html_text()
Reading:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With