Hi I'm working with xml in Rstudio. The objective is to convert a xml to an r data frame and I'm trying on the sample data called tides.xml in the package folder.
tides = system.file("exampleData", "tides.xml", package = "XML")
Maybe we can see the items in the first few columns are constant:
Something like this
origin
NOAA/NOS/CO-OPS
NOAA/NOS/CO-OPS
NOAA/NOS/CO-OPS
NOAA/NOS/CO-OPS
NOAA/NOS/CO-OPS
NOAA/NOS/CO-OPS
NOAA/NOS/CO-OPS
Therefore when I use
xmlToDataFrame(xmlRoot(tides.str))
it returns error:
Error in `[<-.data.frame`(`*tmp*`, i, names(nodes[[i]]), value = c("2010/11/13Sat06:08 AM4.74H", :
duplicate subscripts for columns
I know I can do something like this:
xmlToDataFrame(nodes = xmlChildren(xmlRoot(tides.str)[["data"]]))
to produce a data frame but it is just a subset and I need to manually insert the first few columns.
So I am thinking is there anything I can do to remove the error by just changing some of the arguments in xmlToDataFrame() function and using the whole xml data?
Thanks in advance.
I'm not sure if it's possible with xmlToDataFrame
. But you can extract all the non-data nodes and turn it into a data.frame yourself without too much trouble.
library(XML)
tides = system.file("exampleData","tides.xml", package="XML")
tides.str<-xmlParse(tides)
detaildf<-xmlToDataFrame(nodes = getNodeSet(tides.str, "/datainfo/data/item"))
header <- getNodeSet(tides.str, "/datainfo/*[not(self::data)]")
headerdf <- as.data.frame(as.list(setNames(xmlSApply(header, xmlValue),
xmlSApply(header, xmlName))))
merge(headerdf, detaildf)
And then at the end we just "merge" the two parts to repeat the header for each line in the detail.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With