I am trying to scrape a webpage
library(RCurl)
webpage <- getURL("https://somewebpage.com")
webpage
<div class='CredibilityFacts'><span id='qZyoLu'><a class='answer_permalink'
action_mousedown='AnswerPermalinkClickthrough' href='/someurl/answer/my_id'
id ='__w2_yeSWotR_link'>
<a class='another_class' action_mousedown='AnswerPermalinkClickthrough'
href='/ignore_url/answer/some_id' id='__w2_ksTVShJ_link'>
<a class='answer_permalink' action_mousedown='AnswerPermalinkClickthrough'
href='/another_url/answer/new_id' id='__w2_ksTVShJ_link'>
class(webpage)
[1] "character"
I am trying to extract all the href
value but only when it is preceded with answer_permalink
class.
The output of this should be
[1] "/someurl/answer/my_id" "/another_url/answer/new_id"
/ignore_url/answer/some_id
should be ignored as it is preceded with another_class
and not answer_permalink
class.
Right now, I am thinking of an approach with regex. I think something like this can be used for regex in stri_extract_all
class='answer_permalink'.*href='
but this isn't exactly what I want.
In what way can I achieve this? Moreover, apart from regex is there a function in R where we can extract element by class like in Javascript?
With dplyr
and rvest
we could do:
library(rvest)
library(dplyr)
"https://www.quora.com/profile/Ronak-Shah-96" %>%
read_html() %>%
html_nodes("[class='answer_permalink']") %>%
html_attr("href")
[1] "/How-can-we-adjust-in-engineering-if-we-are-not-in-IITs-or-NITs-How-can-we-enjoy-engineering-if-we-are-pursuing-it-from-a-local-private-college/answer/Ronak-Shah-96" [2] "/Do-you-think-it-is-worth-it-to-change-my-career-path-For-the-past-2-years-I-was-pursuing-a-career-in-tax-advisory-in-a-BIG4-company-I-just-got-a-job-offer-that-will-allow-me-to-learn-coding-It-is-not-that-well-paid/answer/Ronak-Shah-96" [3] "/Why-cant-India-opt-for-40-hours-work-a-week-for-all-professions-when-it-is-proved-and-working-well-in-terms-of-efficiency/answer/Ronak-Shah-96" [4] "/Why-am-I-still-confused-and-thinking-about-my-career-after-working-more-than-one-year-in-software-engineering/answer/Ronak-Shah-96" [5] "/Would-you-rather-be-a-jack-of-all-trades-or-the-master-of-one-trade/answer/Ronak-Shah-96"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With