Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Write XML using pipe operator with xml2

Tags:

r

xml2

The xml2 package allows users to create XML documents. I'm trying to create a document using the pipe operator %>% to add various combinations of child and sibling nodes. I cannot figure out how to create a child node within a child node that is following by the original child's sibling (see example below).

Is it possible to "rise" up a level to then create more nodes or must they be created outside of the chained commands?

What I want

library(xml2)
x1 <- read_xml("<parent><child>1</child><child><grandchild>2</grandchild></child><child>3</child><child>4</child></parent>")
message(x1)
#> <?xml version="1.0" encoding="UTF-8"?>
#> <parent>
#>  <child>1</child>
#>  <child>
#>    <grandchild>2</grandchild>
#>  </child>
#>  <child>3</child>
#>  <child>4</child>
#> </parent>

What I'm creating that's wrong

library(magrittr)
library(xml2)
x2 <- xml_new_document()
x2 %>% 
  xml_add_child("parent") %>%
  xml_add_child("child", 1) %>%
  xml_add_sibling("child", 4, .where="after") %>%
  xml_add_sibling("child", 3) %>%
  xml_add_sibling("child", .where="before") %>%
  xml_add_child("grandchild", 2)
message(x2)
#> <?xml version="1.0" encoding="UTF-8"?>
#> <parent>
#>  <child>1</child>
#>  <child>4</child>
#>  <child>
#>    <grandchild>2</grandchild>
#>  </child>
#>  <child>3</child>
#> </parent>

Solution using XML package

This is actually fairly straightforward if done using the XML package.

library(XML)
x2 <- newXMLNode("parent")
invisible(newXMLNode("child", 1, parent=x2))
invisible(newXMLNode("child", newXMLNode("grandchild", 2), parent=x2))
invisible(newXMLNode("child", 3, parent=x2))
invisible(newXMLNode("child", 4, parent=x2))
x2
#> <?xml version="1.0" encoding="UTF-8"?>
#> <parent>
#>  <child>1</child>
#>  <child>
#>    <grandchild>2</grandchild>
#>  </child>
#>  <child>3</child>
#>  <child>4</child>
#> </parent>
like image 238
Steven M. Mortimer Avatar asked Mar 06 '23 15:03

Steven M. Mortimer


1 Answers

I'm going to start by saying that I think this is generally a bad idea. xml2 works using pointers, which means that it has reference semantics ("pass by reference"), which is not the typical behavior in R. Functions in xml2 work by producing side effects on the XML tree, not by returning values like in functional programming ("pass by value").

This means that piping is basically the wrong principle. You just need a series of steps that modify the object in the correct order.

That said, you can do:

library("magrittr")
library("xml2")
x2 <- xml_new_document()
x2 %>% 
  xml_add_child(., "parent") %>%
{
  xml_add_child(., "child", 1, .where = "after")
  (xml_add_child(., "child") %>% xml_add_child("grandchild", 2))
  xml_add_child(., "child", 3, .where = "after")
  xml_add_child(., "child", 4, .where = "after")
}
message(x2)
## <?xml version="1.0" encoding="UTF-8"?>
## <parent>
##   <child>1</child>
##   <child>
##     <grandchild>2</grandchild>
##   </child>
##   <child>3</child>
##   <child>4</child>
## </parent>

The . tells the %>% where to place the "parent" node in subsequent calls to xml_add_child(). The ()-bracketed expression in the middle takes advantage of the fact that you want to pipe into the "child" node then pipe that child node into the grandchild node.

Another option, if you really want to use pipes throughout is to use the %T>% pipe, instead of the %>% pipe (or rather, a mix of the two). The difference between the two is the following:

> 1:3 %>% mean() %>% str()
 num 2
> 1:3 %T>% mean() %>% str()
 int [1:3] 1 2 3

The %T>% pipe pushes the value of the lefthand side expression into the righthand side expression, but further pushes it into the subsequent expression. This means you can call functions in the middle of a pipeline for their side effects and continue to pass the earlier object reference forward in the pipeline.

This is what you're trying to do when you say "rise up a level" - that is, back up to a previous value in the pipeline and work from there. So you need to just %T>% pipe until you get to a point where you want to %>% pipe (e.g., to create the grandchild) and then return to %T>% piping to continue carrying the parent object reference forward. An example:

x3 <- xml_new_document()
x3 %>% 
  xml_add_child("parent") %T>%
    xml_add_child("child", 1, .where = "after") %T>%
    {xml_add_child(., "child") %>% xml_add_child("grandchild", 2)} %T>%
    xml_add_child("child", 3, .where = "after") %>%
    xml_add_child("child", 4, .where = "after")
message(x3)
## <?xml version="1.0" encoding="UTF-8"?>
## <parent>
##   <child>1</child>
##   <child>
##     <grandchild>2</grandchild>
##   </child>
##   <child>3</child>
##   <child>4</child>
## </parent>

Note the final %>% instead of %T>%. If you swapped %>% for %T>% the value of the whole pipeline would be the "parent" node tree only:

{xml_document}
<parent>
[1] <child>1</child>
[2] <child>\n  <grandchild>2</grandchild>\n</child>
[3] <child>3</child>
[4] <child>4</child>

(Which - again - ultimately doesn't really matter because we're actually building x3 using side effects, but it will print the parent node tree to the console, which is probably confusing.)

Again, I'd suggest not using the pipe at all given the awkwardness, but it's up to you. A better way is just to preserve each object you want to attach a child to and then refer to it again each time. Like in the first example, save the parent node as p, skip all the pipes, and just refer to p everywhere that . is used in the example code.

like image 50
Thomas Avatar answered Mar 20 '23 05:03

Thomas