EDIT - I've figured out the solution to my problem and posted a Q&A here.
I'm looking to process XML conforming to the Library of Congress EAD standard (found here). Unfortunately, the standard is very loose regarding the structure of the XML.
For example the <bioghist>
tag can exist within the <archdesc>
tag, or within a <descgrp>
tag, or nested within another <bioghist>
tag, or a combination of the above, or can be left out entirely. I've found it to be very difficult to select just the bioghist tag I'm looking for without also selecting others.
Below are a few different possible EAD XML documents my XSLT might have to process:
First example
<ead>
<eadheader>
<archdesc>
<bioghist>one</bioghist>
<dsc>
<c01>
<descgrp>
<bioghist>two</bioghist>
</descgrp>
<c02>
<descgrp>
<bioghist>
<bioghist>three</bioghist>
</bioghist>
</descgrp>
</c02>
</c01>
</dsc>
</archdesc>
</eadheader>
</ead>
Second example
<ead>
<eadheader>
<archdesc>
<descgrp>
<bioghist>
<bioghist>one</bioghist>
</bioghist>
</descgrp>
<dsc>
<c01>
<c02>
<descgrp>
<bioghist>three</bioghist>
</descgrp>
</c02>
<bioghist>two</bioghist>
</c01>
</dsc>
</archdesc>
</eadheader>
</ead>
Third example
<ead>
<eadheader>
<archdesc>
<descgrp>
<bioghist>one</bioghist>
</descgrp>
<dsc>
<c01>
<c02>
<bioghist>three</bioghist>
</c02>
</c01>
</dsc>
</archdesc>
</eadheader>
</ead>
As you can see, an EAD XML file might have a <bioghist>
tag almost anywhere. The actual output I'm suppose to produce is too complicated to post here. A simplified example of the output for the above three EAD examples might be like:
Output for First example
<records>
<primary_record>
<biography_history>first</biography_history>
</primary_record>
<child_record>
<biography_history>second</biography_history>
</child_record>
<granchild_record>
<biography_history>third</biography_history>
</granchild_record>
</records>
Output for Second example
<records>
<primary_record>
<biography_history>first</biography_history>
</primary_record>
<child_record>
<biography_history>second</biography_history>
</child_record>
<granchild_record>
<biography_history>third</biography_history>
</granchild_record>
</records>
Output for Third example
<records>
<primary_record>
<biography_history>first</biography_history>
</primary_record>
<child_record>
<biography_history></biography_history>
</child_record>
<granchild_record>
<biography_history>third</biography_history>
</granchild_record>
</records>
If I want to pull the "first" bioghist value and put that in the <primary_record>
, I can't simply <xsl:apply-templates select="/ead/eadheader/archdesc/bioghist"
, as that tag might not be a direct descendant of the <archdesc>
tag. It might be wrapped by a <descgrp>
or a <bioghist>
or a combination thereof. And I can't select="//bioghist"
, because that will pull all the <bioghist>
tags. I can't even select="//bioghist[1]"
because there might not actually be a <bioghist>
tag there and then I'll be pulling the value below the <c01>
, which is "Second" and should be processed later.
This is already a long post, but one other wrinkle is that there can be an unlimited number of <cxx>
nodes, nested up to twelve levels deep. I'm currently processing them recursively. I've tried saving the node I'm currently processing (<c01>
for example) as a variable called 'RN', then running <xsl:apply-templates select=".//bioghist [name(..)=name($RN) or name(../..)=name($RN)]">
. This works for some forms of EAD, where the <bioghist>
tag isn't nested too deeply, but it will fail if it ever has to process an EAD file created by someone who loves wrapping tags in other tags (which is totally fine according to the EAD Standard).
What I'd love is someway of saying
<bioghist>
tag anywhere below the current node but<c??>
tagI hope that I've made the situation clear. Please let me know if I've left anything ambiguous. Any assistance you can provide would be greatly appreciated. Thanks.
As the requirements are rather vague, any answer only reflects the guesses its author has made.
Here is mine:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:my="my:my" exclude-result-prefixes="my">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<my:names>
<n>primary_record</n>
<n>child_record</n>
<n>grandchild_record</n>
</my:names>
<xsl:variable name="vNames" select="document('')/*/my:names/*"/>
<xsl:template match="/">
<xsl:apply-templates select=
"//bioghist[following-sibling::node()[1]
[self::descgrp]
]"/>
</xsl:template>
<xsl:template match="bioghist">
<xsl:variable name="vPos" select="position()"/>
<xsl:element name="{$vNames[position() = $vPos]}">
<xsl:value-of select="."/>
</xsl:element>
</xsl:template>
<xsl:template match="text()"/>
</xsl:stylesheet>
When this transformation is applied on the provided XML document:
<ead>
<eadheader>
<archdesc>
<bioghist>first</bioghist>
<descgrp>
<bioghist>first</bioghist>
<bioghist>
<bioghist>first</bioghist></bioghist>
</descgrp>
<dsc>
<c01>
<bioghist>second</bioghist>
<descgrp>
<bioghist>second</bioghist>
<bioghist>
<bioghist>second</bioghist></bioghist>
</descgrp>
<c02>
<bioghist>third</bioghist>
<descgrp>
<bioghist>third</bioghist>
<bioghist>
<bioghist>third</bioghist></bioghist>
</descgrp>
</c02>
</c01>
</dsc>
</archdesc>
</eadheader>
</ead>
the wanted result is produced:
<primary_record>first</primary_record>
<child_record>second</child_record>
<grandchild_record>third</grandchild_record>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With