I'm trying to get values from xml that looks like this:
<data>
<result name="r">
<item>
<str name="id">123</str>
<str name="xxx">aaa</str>
</item>
<item>
<str name="id">456</str>
<str name="xxx">aaa</str>
</item>
</result>
</data>
So far, I can get the id value in the following way:
xmlfile <- xmlParse(url)
data <- xmlRoot(xmlfile)
result <- xmltop[["result"]]
for (i in xmlSize(result)) {
print(xmlValue(result[[i]][[1]]))
}
This seems highly inefficient and only works if "id" is stored in the first child element. So, is there a way to get the value of an element (123, 456) by searching for the attribute (name) and value (id)?
The xml2 package is very good for solving this type of problem.
library(xml2)
page<-read_xml('<data>
<result name="r">
<item>
<str name="id">123</str>
<str name="xxx">aaa</str>
</item>
<item>
<str name="id">456</str>
<str name="xxx">aaa</str>
</item>
</result>
</data>')
#find all str nodes
nodes<-xml_find_all(page, ".//str")
#filter out the nodes where the attribute name=id
nodes<-nodes[xml_attr(nodes, "name")=="id"]
#get values (as character strings)
xml_text(nodes)
Update
Using Xpath selectors everything can be accomplished in 1 line
#R verison >4.0
xml_find_all(page, ".//str[@name='id']") |> xml_text()
Here is a link to a handy xpath path cheat sheet: https://www.red-gate.com/simple-talk/development/dotnet-development/xpath-css-dom-and-selenium-the-rosetta-stone/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With