Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

parsing xml to list in R: how to consistently access nodes when xml structure varies?

Tags:

r

xml

settings

Background

I have a xml settings file that can look like this:

<level1>
 <level2>
   <level3>
    <level4name>bob</level4name>
   </level3>
 </level2>
</level1>

but there can be multiple instances of level3

<level1>
 <level2>
   <level3>
    <level4name>bob</level4name> 
   </level3>
   <level3>
    <level4name>jack</level4name> 
   </level3>
   <level3>
    <level4name>jill</level4name> 
   </level3>
 </level2>
</level1>

there can also be multiple types of level4 nodes for each level3:

   <level3>
    <level4name>bob</level4name> 
    <level4dir>/home/bob/ </level4dir> 
    <level4logical>TRUE</level4logical> 
   </level3>

In R, I load this file using

settings.xml <- xmlTreeParse(settings.file)
settings <- xmlToList(settings.xml)

I want to write a script that converts all of the values contained in level4type1 to a vector of the unique values at this level, but I am stumped trying to do this in a way that works for all of the above cases.

One of the problems is that the class(settings[['level2']]) is a list for the first two cases and a matrix for the third case.

> xmlToList(xmlTreeParse('case1.xml'))
$level2.level3.level4name
[1] "bob"
> xmlToList(xmlTreeParse('case2.xml'))
                  level2
level3.level4name "bob" 
level3.level4name "jack"
level3.level4name "jill"
> xmlToList(xmlTreeParse('case3.xml'))
       level2
level3 List,3
level3 List,1
level3 List,1

Questions

I have two questions:

  1. how can I extract a vector of the unique values of 'level4type1`

  2. is there a better way to do this?

like image 269
David LeBauer Avatar asked Mar 24 '11 20:03

David LeBauer


1 Answers

Try using the internal node representation of XML and the xpath language, which is very powerful.

> xml = xmlTreeParse("case2.xml", useInternalNodes=TRUE)
> xpathApply(xml, "//level4name", xmlValue)
[[1]]
[1] "bob"

[[2]]
[1] "jack"

[[3]]
[1] "jill"
like image 105
Martin Morgan Avatar answered Oct 13 '22 17:10

Martin Morgan