Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing XML to R

Tags:

r

xml

xpath

I want to extract exchange rates from the ECB website to convert my local currencies data. However, I am struggling a lot with using xpath (although this helped me a lot).

library(XML)

fileURL <- "https://www.ecb.europa.eu/stats/eurofxref/eurofxref-daily.xml?93aad09b8f8b7bdb69cd1574b5b2665f"
download.file(fileURL, destfile=tf <- tempfile(fileext=".xml"))

xml_file <- xmlParse(tf)
xml_data <- xmlRoot(xml_file)

currency <- xml_data[["number(//Cube/@currency)"]]
rate <- xml_data[["number(//Cube/@rate)"]]

Then I just want to create simple data frame:

df <- data.frame(currency, rate)
like image 733
An economist Avatar asked Feb 07 '23 20:02

An economist


1 Answers

1) xpathSApply The following line gives a character matrix m with currency and rate columns:

m <- t(xpathSApply(xml_data, "//*[@rate]", xmlAttrs))

If needed in the form of a data frame with character and numeric columns add this:

read.table(text = paste(m[, 1], m[, 2]), as.is = TRUE)

Note: We avoided having to deal with namespaces by using * in the XPath expression but if it were desired to explicitly refer to Cube, as in the question, then it would be done like this:

m <- xpathSApply(xml_data, "//x:Cube[@rate]", xmlAttrs, namespaces = "x")

2) read.pattern An alternative way is to parse the XML file using read.pattern in gsubfn. (This does not use the XML package.)

library(gsubfn)
read.pattern(tf, pattern = "'(...)' rate='([0-9.]+)'", col.names = c("currency", "rate"))
like image 103
G. Grothendieck Avatar answered Feb 16 '23 02:02

G. Grothendieck