Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In R, how to extracting two values from XML file, looping over 5603 files and write to table

Tags:

loops

parsing

r

xml

As I am rather new to R, I am trying to learn how I can extract two values from a XML file and loop over 5603 other (small, <2kb) XML files in my working directory.

I have been reading a lot of topics on 'looping', but find this rather confusing - especially because it seems that looping over XML files is different from looping over other files, correct?

I am using online data in XML structure.

For each XML file I want to write the "ZipCode" and "AwardAmount" to a table.

Running the following code I did retrieve the ZipCode and AwardAmount, but only from the very first file. How can I write a proper loop and write it to a table?

xmlfiles=list.files(pattern="*.xml")
for (i in 1:length(xmlfiles)){
    doc= xmlTreeParse("xmlfiles[i]", useInternal=TRUE)
    zipcode<-xmlValue(doc[["//ZipCode"]])
    amount<-xmlValue(doc[["//AwardAmount"]])
}

Does anyone has some suggestions?

like image 812
wake_wake Avatar asked Dec 08 '25 03:12

wake_wake


1 Answers

This might work for you. I got rid of the for loop and went with sapply.

xmlfiles <- list.files(pattern = "*.xml")
txtfiles <- gsub("xml", "txt", xmlfiles, fixed = TRUE)

txtfiles is a set of new file names to be used as the output file for each run.

sapply(seq(xmlfiles), function(i){

  doc <- xmlTreeParse(xmlfiles[i], useInternal = TRUE)
  zipcode <- xmlValue(doc[["//ZipCode"]])
  amount <- xmlValue(doc[["//AwardAmount"]])
  DF <- data.frame(zip = zipcode, amount = amount)
  write.table(DF, quote = FALSE, row.names = FALSE, file = txtfiles[i])

})

Please, let me know if there are issues when you run it.

like image 169
Rich Scriven Avatar answered Dec 10 '25 19:12

Rich Scriven



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!