How to transform XML data into a data.frame?

Tags:

I'm trying to learn R's XML package. I'm trying to create a data.frame from books.xml sample xml data file. Here's what I get:

library(XML) books <- "http://www.w3schools.com/XQuery/books.xml" doc <- xmlTreeParse(books, useInternalNodes = TRUE) doc xpathApply(doc, "//book", function(x) do.call(paste, as.list(xmlValue(x)))) xpathSApply(doc, "//book", function(x) strsplit(xmlValue(x), " ")) xpathSApply(doc, "//book/child::*", xmlValue)

Each of these xpathSApply's don't get me even close to my intention. How should one proceed toward a well formed data.frame?

635

asked Jan 14 '10 19:01

larus

1 Answers

Ordinarily, I would suggest trying the xmlToDataFrame() function, but I believe that this will actually be fairly tricky because it isn't well structured to begin with.

I would recommend working with this function:

xmlToList(books)

One problem is that there are multiple authors per book, so you will need to decide how to handle that when you're structuring your data frame.

Once you have decided what to do with the multiple authors issue, then it's fairly straight forward to turn your book list into a data frame with the ldply() function in plyr (or just use lapply and convert the return value into a data.frame by using do.call("rbind"...).

Here's a complete example (excluding author):

library(XML) books <-  "w3schools.com/xsl/books.xml" library(plyr) ldply(xmlToList(books), function(x) { data.frame(x[!names(x)=="author"]) } )     .id        title.text title..attrs year price   .attrs  1 book  Everyday Italian           en 2005 30.00  COOKING  2 book      Harry Potter           en 2005 29.99 CHILDREN  3 book XQuery Kick Start           en 2003 49.99      WEB  4 book      Learning XML           en 2003 39.95      WEB

Here's what it looks like with author included. You need to use ldply in this instance since the list is "jagged"...lapply can't handle that properly. [Otherwise you can use lapply with rbind.fill (also courtesy of Hadley), but why bother when plyr automatically does it for you?]:

ldply(xmlToList(books), data.frame)     .id        title.text title..attrs              author year price   .attrs 1 book  Everyday Italian           en Giada De Laurentiis 2005 30.00  COOKING 2 book      Harry Potter           en        J K. Rowling 2005 29.99 CHILDREN 3 book XQuery Kick Start           en      James McGovern 2003 49.99      WEB 4 book      Learning XML           en         Erik T. Ray 2003 39.95      WEB      author.1   author.2   author.3               author.4 1        <NA>       <NA>       <NA>                   <NA> 2        <NA>       <NA>       <NA>                   <NA> 3 Per Bothner Kurt Cagle James Linn Vaidyanathan Nagarajan 4        <NA>       <NA>       <NA>                   <NA>

answered Sep 22 '22 11:09

Shane

Related questions
                            
                                XML and JSON tags for a Golang struct?
                            
                                How do you create a PDF from XML in Java?
                            
                                What is 'Push Approach' and 'Pull Approach' to parsing?
                            
                                How to set attribute in XML using XSLT?
                            
                                How to change the value of XML Element attribute using PowerShell?
                            
                                Use XML Literals in C#?
                            
                                How to query xml column in tsql
                            
                                How can I include an ampersand (&) character in an XML document?
                            
                                How can I select the first element using XSLT?
                            
                                how to parse xml to java object? [closed]
                            
                                XML Document SelectSingleNode returns null
                            
                                Illegal characters in path error while parsing XML in C#
                            
                                maven : Failed to install metadata project Could not parse metadata maven-metadata-local.xml: only whitespace content allowed before start tag
                            
                                How do I generate a comma-separated list with XSLT/XPath?
                            
                                Remove namespace and prefix from xml in python using lxml
                            
                                CardView not showing shadow elevation
                            
                                When using an Android Library Project how do you reference xml resources properly?
                            
                                Cross-browser XPath implementation in JavaScript
                            
                                Documenting overloaded methods with the same XML comments
                            
                                Validate an XSD Schema?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to transform XML data into a data.frame?

Tags:

dataframe

r

xml

larus

People also ask

1 Answers

Shane

Recent Activity

Donate For Us