As I am rather new to R, I am trying to learn how I can extract two values from a XML file and loop over 5603 other (small, <2kb) XML files in my working directory.
I have been reading a lot of topics on 'looping', but find this rather confusing - especially because it seems that looping over XML files is different from looping over other files, correct?
I am using online data in XML structure.
For each XML file I want to write the "ZipCode" and "AwardAmount" to a table.
Running the following code I did retrieve the ZipCode and AwardAmount, but only from the very first file. How can I write a proper loop and write it to a table?
xmlfiles=list.files(pattern="*.xml")
for (i in 1:length(xmlfiles)){
doc= xmlTreeParse("xmlfiles[i]", useInternal=TRUE)
zipcode<-xmlValue(doc[["//ZipCode"]])
amount<-xmlValue(doc[["//AwardAmount"]])
}
Does anyone has some suggestions?
This might work for you. I got rid of the for loop and went with sapply.
xmlfiles <- list.files(pattern = "*.xml")
txtfiles <- gsub("xml", "txt", xmlfiles, fixed = TRUE)
txtfiles is a set of new file names to be used as the output file for each run.
sapply(seq(xmlfiles), function(i){
doc <- xmlTreeParse(xmlfiles[i], useInternal = TRUE)
zipcode <- xmlValue(doc[["//ZipCode"]])
amount <- xmlValue(doc[["//AwardAmount"]])
DF <- data.frame(zip = zipcode, amount = amount)
write.table(DF, quote = FALSE, row.names = FALSE, file = txtfiles[i])
})
Please, let me know if there are issues when you run it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With