I've parsed an XML document with R, e.g:
library(XML)
f = system.file("exampleData", "mtcars.xml", package="XML")
doc = xmlParse(f)
Using XPath expressions, I can select specific nodes in the document:
> getNodeSet(doc, "//record[@id='Mazda RX4']/text()")
[[1]]
21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
attr(,"class")
[1] "XMLNodeSet"
But I can't figure out how to turn the result into an R character vector:
> as.character(getNodeSet(doc, "//record[@id='Mazda RX4']/text()"))
[1] "<pointer: 0x000000000e6a7fe0>"
How do I get text from an internal pointer to a C object?
An XML file can be read in R using the function xmlParse() . Then, load data is stored in a list. An XML file can also be read in the form of a data frame by using the xmlToDataFrame() method.
It stands for Extensible Markup Language (XML). Similar to HTML it contains markup tags. But unlike HTML where the markup tag describes structure of the page, in xml the markup tags describe the meaning of the data contained into he file. You can read a xml file in R using the "XML" package.
File formats like csv, xml, xlsx, json, and web data can be imported into the R environment to read the data and perform data analysis, data manipulations and after data analysis data in R can be exported to external files in the same file formats.
What is XML? The Extensible Markup Language (XML) is a simple text-based format for representing structured information: documents, data, configuration, books, transactions, invoices, and much more. It was derived from an older standard format called SGML (ISO 8879), in order to be more suitable for Web use.
Use xmlValue
. Here's an extension of your example to help you see what the classes are:
v <- getNodeSet(doc, "//record[@id='Mazda RX4']/text()")
str(v)
#List of 1
#$ :Classes 'XMLInternalTextNode', 'XMLInternalNode', 'XMLAbstractNode' <externalptr>
#- attr(*, "class")= chr "XMLNodeSet"
v2 <- sapply(v, xmlValue) #this is the code chunk of interest to you
v2
#[1] " 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4"
str(v2)
#chr " 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4"
The following will also work: Instead of getNodeSet() and sapply(v,xmlValue), you can use xpathApply and add xmlValue as an argument
doc = xmlParse(f)
xpathApply(doc,"//record[@id='Mazda RX4']/text()")
[[1]]
21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
attr(,"class")
[1] "XMLNodeSet"
xpathApply(doc,"//record[@id='Mazda RX4']/text()",xmlValue)
[[1]]
[1] " 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4"
This is a character object in a list. You can transform it into a vector of numeric objects by unlisting, splitting the string with regex of one or more spaces, unlisting again and as.numeric()
as.numeric(unlist(strsplit(unlist(v)," +")))
[1] NA 21.00 6.00 160.00 110.00 3.90 2.62 16.46 0.00 1.00 4.00 4.00
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With