How to read an XML input file, manipulate some nodes (remove and rename some) and write the output to a new XML output file?

Question

I need to read an XML file from internet and re-shape it. Here is the XML file and the code I have so far.

library(XML)
url='http://ClinicalTrials.gov/show/NCT00001400?displayxml=true'  
doc = xmlParse(url,useInternalNode=TRUE)

I was able to use some functions within the XML package with sucess(e.g., getNodeSet), but I am not an expert and there are some examples on the internet but I was not able to crack this problem myself. I also know some XPath but this was 4 years ago and I am not an expert on sapply and similar functions.

But my goal is this:

I need to remove a whole set of XML children branches about location, for example: <location> ... anything </location>. There can be multiple nodes with location data. I simply don't need that detail in the output. The XML file above always complies to an XSD schema. The root node is called <clinical_study>.
The resulted simplified file should be written into a new XML file called "data-changed.xml".
I also need to rename and move one branch from old nested place of

<eligibility> <criteria> <textblock> Inclusion criteria are xyz </textblock/>...
In new output ("data-changed.xml") the structure should say a different XML node and be directly under root node:

<eligibility_criteria> Inclusion criteria are xyz </eligibility_criteria>

So I need to:

read the XML into memory
manipulate the tree (prune it somewhere)
move some XML nodes to a new place and under a new name and
write the resulting XML output file.

Any ideas are greatly appreciated?

Also, if you know about a nice (recent !) tutorial on XML parsing within R (or book chapter which tackles it, please share the reference). (I read the vignettes by Duncan and these are too advanced (too concise)).

Angie Lambarri · Accepted Answer

Code to remove all location nodes:

r <- xmlRoot(doc)
removeNodes(r[names(r) == "location"])

How to read an XML input file, manipulate some nodes (remove and rename some) and write the output to a new XML output file?

Tags:

r

xml

userJT

1 Answers

Angie Lambarri

Recent Activity

Donate For Us

How to read an XML input file, manipulate some nodes (remove and rename some) and write the output to a new XML output file?

Tags:

r

xml

userJT

1 Answers

Angie Lambarri

Related questions

Recent Activity

Donate For Us