I have a XML Document file. The part of the file looks like this:
-<attr>
<attrlabl>COUNTY</attrlabl>
<attrdef>County abbreviation</attrdef>
<attrtype>Text</attrtype>
<attwidth>1</attwidth>
<atnumdec>0</atnumdec>
-<attrdomv>
-<edom>
<edomv>C</edomv>
<edomvd>Clackamas County</edomvd>
<edomvds/>
</edom>
-<edom>
<edomv>M</edomv>
<edomvd>Multnomah County</edomvd>
<edomvds/>
</edom>
-<edom>
<edomv>W</edomv>
<edomvd>Washington County</edomvd>
<edomvds/>
</edom>
</attrdomv>
</attr>
From this XML file, I want to create an R data frame with the columns of attrlabl, attrdef, attrtype, and attrdomv. Please note that the attrdomv column should include all of the levels for the category variable. The data frame should look like this:
attrlabl attrdef attrtype attrdomv
COUNTY County abbreviation Text C Clackamas County; M Multnomah County; W Washington County
I have an incomplete code like this:
doc <- xmlParse("taxlots.shp.xml")
dataDictionary <- xmlToDataFrame(getNodeSet(doc,"//attrlabl"))
Could you please complete my R code? I appreciate any help!
Assuming this is the correct taxlots.shp.xml
file:
<attr>
<attrlabl>COUNTY</attrlabl>
<attrdef>County abbreviation</attrdef>
<attrtype>Text</attrtype>
<attwidth>1</attwidth>
<atnumdec>0</atnumdec>
<attrdomv>
<edom>
<edomv>C</edomv>
<edomvd>Clackamas County</edomvd>
<edomvds/>
</edom>
<edom>
<edomv>M</edomv>
<edomvd>Multnomah County</edomvd>
<edomvds/>
</edom>
<edom>
<edomv>W</edomv>
<edomvd>Washington County</edomvd>
<edomvds/>
</edom>
</attrdomv>
</attr>
You were almost there:
doc <- xmlParse("taxlots.shp.xml")
xmlToDataFrame(nodes=getNodeSet(doc1,"//attr"))[c("attrlabl","attrdef","attrtype","attrdomv")]
attrlabl attrdef attrtype attrdomv
1 COUNTY County abbreviation Text CClackamas CountyMMultnomah CountyWWashington County
But the last field has not the format you wanted. To do so, require some additional steps:
step1 <- xmlToDataFrame(nodes=getNodeSet(doc1,"//attrdomv/edom"))
step1
edomv edomvd edomvds
1 C Clackamas County
2 M Multnomah County
3 W Washington County
step2 <- paste(paste(step1$edomv, step1$edomvd, sep=" "), collapse="; ")
step2
[1] "C Clackamas County; M Multnomah County; W Washington County"
cbind(xmlToDataFrame(nodes= getNodeSet(doc1, "//attr"))[c("attrlabl", "attrdef", "attrtype")],
attrdomv= step2)
attrlabl attrdef attrtype attrdomv
1 COUNTY County abbreviation Text C Clackamas County; M Multnomah County; W Washington County
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With