What is the best way to cast an object from the {XML}
package back to a "normal" R character vector?
For example:
require(XML)
doc <- htmlParse("http://cran.r-project.org/web/packages/XML/index.html")
class(doc)
# [1] "HTMLInternalDocument" "HTMLInternalDocument"
# "XMLInternalDocument" "XMLAbstractDocument"
Similar to this suggestion, I could do this:
doc.char <- capture.output(doc)
But this seems like a circuitous route. However, I didn't find any other appropriate method. And this bugged me already a few times.
If you just want a character vector then use readLines()
instead of htmlParse()
. But likely you have a more specific need and then the answer is to use XPath to query doc
; see ?getNodeSet
(and the syntax doc["//path"]
) and the examples on that help page.
For your specific question I did
library(XML)
doc <- htmlParse("http://cran.r-project.org/web/packages/XML/index.html")
showMethods(class=class(doc), where=search())
and arrived at
as(doc, "character")
I think you can achieve this with do.call(paste, as.list(capture.output(doc)))
(I had some issues too and I think you can do it as well with sapply
as @flodel suggested me here on nodes NodeSet as character)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With