I'm trying to get values from xml
that looks like this:
<data>
<result name="r">
<item>
<str name="id">123</str>
<str name="xxx">aaa</str>
</item>
<item>
<str name="id">456</str>
<str name="xxx">aaa</str>
</item>
</result>
</data>
So far, I can get the id
value in the following way:
xmlfile <- xmlParse(url)
data <- xmlRoot(xmlfile)
result <- xmltop[["result"]]
for (i in xmlSize(result)) {
print(xmlValue(result[[i]][[1]]))
}
This seems highly inefficient and only works if "id" is stored in the first child element. So, is there a way to get the value of an element (123, 456
) by searching for the attribute (name
) and value (id
)?
The xml2
package is very good for solving this type of problem.
library(xml2)
page<-read_xml('<data>
<result name="r">
<item>
<str name="id">123</str>
<str name="xxx">aaa</str>
</item>
<item>
<str name="id">456</str>
<str name="xxx">aaa</str>
</item>
</result>
</data>')
#find all str nodes
nodes<-xml_find_all(page, ".//str")
#filter out the nodes where the attribute name=id
nodes<-nodes[xml_attr(nodes, "name")=="id"]
#get values (as character strings)
xml_text(nodes)
Update
Using Xpath selectors everything can be accomplished in 1 line
#R verison >4.0
xml_find_all(page, ".//str[@name='id']") |> xml_text()
Here is a link to a handy xpath path cheat sheet: https://www.red-gate.com/simple-talk/development/dotnet-development/xpath-css-dom-and-selenium-the-rosetta-stone/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With