I would like some help in trying to get each verse of this bible chapter from the following website as a row of strings in a dataframe.
I am struggling to find the correct element/don't know how to use findElements() in conjunction with inspect element in the browser. Any indication of how to do this generally for other bits too, e.g. cross references/footnotes would be great...(note the cross references can be seen by adjusted the 'page options' by clicking on the cog near the top of the page
Below is the code I have attempted.
chapter.url <- "https://www.biblegateway.com/passage/?search=Genesis+50&version=ESV"
library(RSelenium)
RSelenium:::startServer()
remDr <- remoteDriver()
remDr$open()
remDr$navigate(chapter.url)
webElem <- remDr$findElements('id','passage-text')
Normally I would target the relevant HTML. Inspecting the page with firefox firebug or something similar we see:
The relevant HTML snippet is <div class="version-ESV result-text-style-normal text-html ">
.
So we could find the element with class version-ESV
:
chapter.url <- "https://www.biblegateway.com/passage/?search=Genesis+50&version=ESV"
library(RSelenium)
RSelenium:::startServer()
remDr <- remoteDriver()
remDr$open()
remDr$navigate(chapter.url)
webElem <- remDr$findElement('class', 'version-ESV')
webElem$highlightElement() # check visually we have the right element
The highlightElement
method gives us visual confirmation that we have the required block of HTML. Finally we can get this snippet of HTML using the getElementAttribute
method:
appData <- webElem$getElementAttribute("outerHTML")[[1]]
this HTML can then be parsed for the verses using the XML
package.
UPDATE:
The various verses contained in a span
with an id
which starts with "en-ESV-" we can target this using '//span[contains(@id,"en-ESV-")]
for an XPATH. However within these code blocks we only want the child nodes that are text nodes. Once we find these text nodes we wish to paste them together seperating with spaces:
appXPATH <- '//span[contains(@id,"en-ESV-")]'
appFunc <- function(x){
appChildren <- xmlChildren(x)
out <- appChildren[names(appChildren) == "text"]
paste(sapply(out, xmlValue), collapse = ' ')
}
doc <- htmlParse(appData, encoding = 'UTF8') # specify encoding
results <- xpathSApply(doc, appXPATH, appFunc)
with the following results:
> head(results)
[1] "Then Joseph fell on his father's face and wept over him and kissed him."
[2] "And Joseph commanded his servants the physicians to embalm his father. So the physicians embalmed Israel."
[3] "Forty days were required for it, for that is how many are required for embalming. And the Egyptians wept for him seventy days."
[4] "And when the days of weeping for him were past, Joseph spoke to the household of Pharaoh, saying, “If now I have found favor in your eyes, please speak in the ears of Pharaoh, saying,"
[5] "‘My father made me swear, saying, “I am about to die: in my tomb that I hewed out for myself in the land of Canaan, there shall you bury me.” Now therefore, let me please go up and bury my father. Then I will return.’”"
[6] "And Pharaoh answered, “Go up, and bury your father, as he made you swear.”"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With