I'd like to do exactly what xml2::xml_text()
or rvest::html_text()
do but preserve the tags instead of replacing e.g. <br>
with \n
. The objective is to e.g. scrape a web page, extract the nodes I want, and store the plain HTML in a variable, much like write_html()
would store it in a file.
How can I do this?
Ironically, it turns out that as.character()
works just fine.
Therefore:
library(rvest)
html <- read_html("http://stackoverflow.com")
res <– html %>%
html_node("h1") %>%
as.character()
> res
[1] "<h1 class=\"-title\">Learn, Share, Build</h1>"
This is the desired output in my current use case.
On the other hand, for comparison if one needs to strip the tags:
res <- html %>%
html_node("h1") %>%
html_text()
> res
[1] "Learn, Share, Build"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With