This question is based on another that I saw closed which generated curiosity as I learned something new about Google Chrome's Inspect Element to create the HTML parsing path for XML::getNodeSet
. While this question was closed as I think it may have been too broad I'll ask a smaller more focused question that may get at the root of the problem.
I tried to help the poster by writing code I typically use for scraping but ran into a wall immediately as the poster wanted elements from Google Chrome's Inspect Element. This is not the same as the HTML from htmlTreeParse
as demonstrated here:
url <- "http://collegecost.ed.gov/scorecard/UniversityProfile.aspx?org=s&id=198969"
doc <- htmlTreeParse(url, useInternalNodes = TRUE)
m <- capture.output(doc)
any(grepl("258.12", m))
## FALSE
But here in Google Chrome's Inspect Element we can see that this information is provided (in yellow):
How can we get the information from Google Chrome's Inspect Element into R? The poster could obviously copy and paste the code into a text editor and parse that way but they are looking to scrape and thus that workflow does not scale. Once the poster can get this info into R they can then use typical HTML parsing techniques (XLM
and RCurl
-fu).
You can copy by inspect element and target the div you want to copy. Just press ctrl+c and then your div will be copy and paste in your code it will run easily.
You should be able to scrape the page using something like the following code for RSelenium. You need to have java installed and available on your path for the startServer()
line to work (and thus for you to be able to do anything).
library("RSelenium")
checkForServer()
startServer()
remDr <- remoteDriver(remoteServerAddr = "localhost",
port = 4444,
browserName = "firefox"
)
url <- "http://collegecost.ed.gov/scorecard/UniversityProfile.aspx?org=s&id=198969"
remDr$open()
remDr$navigate(url)
source <- remDr$getPageSource()[[1]]
Check to make sure it worked according to your test:
> grepl("258.12", source)
[1] TRUE
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With