R {xml_node} to plain text while preserving the tags?

Question

I'd like to do exactly what xml2::xml_text() or rvest::html_text() do but preserve the tags instead of replacing e.g. <br> with \n. The objective is to e.g. scrape a web page, extract the nodes I want, and store the plain HTML in a variable, much like write_html() would store it in a file.

How can I do this?

Harold Cavendish · Accepted Answer

Ironically, it turns out that as.character() works just fine.

Therefore:

library(rvest)
html <- read_html("http://stackoverflow.com")

res <– html %>%
         html_node("h1") %>%
         as.character()

> res

[1] "<h1 class=\"-title\">Learn, Share, Build</h1>"

This is the desired output in my current use case.

On the other hand, for comparison if one needs to strip the tags:

res <- html %>%
         html_node("h1") %>%
         html_text()

> res
[1] "Learn, Share, Build"

R {xml_node} to plain text while preserving the tags?

Tags:

r

rvest

xml2

Harold Cavendish

1 Answers

Harold Cavendish

Recent Activity

Donate For Us

R {xml_node} to plain text while preserving the tags?

Tags:

r

rvest

xml2

Harold Cavendish

1 Answers

Harold Cavendish

Related questions

Recent Activity

Donate For Us