Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

XPath in R: return NA if node is missing

Tags:

r

xpath

I'm trying to search for nodes in an html document using Xpath in R. In the code below, I would like to know how return a NULL or NA when a node is missing:

library(XML)
b <- '
<bookstore specialty="novel">
<book style="autobiography">
<author>
<first-name>Joe</first-name>
<last-name>Bob</last-name>
</author>
</book>
<book style="textbook">
<author>
<first-name>Mary</first-name>
<last-name>Bob</last-name>
</author>
<author>
<first-name>Britney</first-name>
<last-name>Bob</last-name>
</author>
<price>55</price>
</book>
<book style="novel" id="myfave">
<author>
<first-name>Toni</first-name>
<last-name>Bob</last-name>
</author>
</bookstore>
'
doc2 <- htmlTreeParse(b, useInternal=T)
xpathApply(doc2, "//author/first-name", xmlValue)

For instance, when I run the xpathApply() function on author I would get 4 results, but if I was to delete one of the <first-name> nodes, I want the xpathApply function to return a NULL or something else in its place, I dont want it to skip it. I want the result to look like this if I was to delete <first-name>Mary</first-name>:

Joe
NA
Britney
Tony
like image 282
jmich738 Avatar asked Sep 30 '14 13:09

jmich738


2 Answers

You can do something like this :

xpathApply(doc2, "//author",
           function(x){
             if("first-name" %in% names(x))
               xmlValue(x[["first-name"]])
             else NA})

[[1]]
[1] "Joe"

[[2]]
[1] NA

[[3]]
[1] "Britney"

[[4]]
[1] "Toni"
like image 82
agstudy Avatar answered Sep 16 '22 22:09

agstudy


Alternate method:

extractFirstName <- function(node) {
  val <- unlist(xpathApply(node, "first-name", xmlValue))
  if (is.null(val)) { val <- NA }
  val
}

xpathApply(doc2, "//author", extractFirstName)

## [[1]]
## [1] "Joe"
## 
## [[2]]
## [1] NA
## 
## [[3]]
## [1] "Britney"
## 
## [[4]]
## [1] "Toni"
like image 26
hrbrmstr Avatar answered Sep 18 '22 22:09

hrbrmstr