I'm checking to see if anyone has an XSLT laying around that transforms HTML tables to CALS. I've found a lot of material on going the other way (CALS to HTML), but not from HTML. I thought somebody may have done this before so I don't have to reinvent the wheel. I'm not looking for a complete solution. Just a starting point.
If I get far enough on my own, I'll post it for future reference.
I've come up with a much simpler solution than what @Flack linked to:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:template match="tbody">
<xsl:variable name="maxColumns">
<xsl:for-each select="tr">
<xsl:sort select="sum(td/@colspan) + count(td[not(@colspan)])" data-type="number"/>
<xsl:if test="position() = last()">
<xsl:value-of select="sum(td/@colspan) + count(td[not(@colspan)])"/>
</xsl:if>
</xsl:for-each>
</xsl:variable>
<tgroup>
<xsl:attribute name="cols">
<xsl:value-of select="$maxColumns"/>
</xsl:attribute>
<xsl:apply-templates select="@*|node()"/>
</tgroup>
</xsl:template>
<xsl:template match="td[@colspan > 1]">
<entry>
<xsl:attribute name="namest">
<xsl:value-of select="sum(preceding-sibling::td/@colspan) + count(preceding-sibling::td[not(@colspan)]) + 1"/>
</xsl:attribute>
<xsl:attribute name="nameend">
<xsl:value-of select="sum(preceding-sibling::td/@colspan) + count(preceding-sibling::td[not(@colspan)]) + @colspan"/>
</xsl:attribute>
<xsl:apply-templates select="@*[name() != 'colspan']|node()"/>
</entry>
</xsl:template>
<xsl:template match="tr">
<row>
<xsl:apply-templates select="@*|node()"/>
</row>
</xsl:template>
<xsl:template match="td">
<entry>
<xsl:apply-templates select="@*|node()"/>
</entry>
</xsl:template>
<xsl:template match="td/@rowspan">
<xsl:attribute name="morerows">
<xsl:value-of select=". - 1"/>
</xsl:attribute>
</xsl:template>
<!-- fallback rule -->
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
There are two tricky points. First, a CALS table needs a tgroup/@cols attribute containing the number of columns. So we need to find the maximum number of cells in one row in the XHTML table - but we must heed colspan declarations so that a cell with colspan > 1 creates the right number of columns! The first template in my stylesheet does just that, based on @Tim C's answer to the max cells per row problem.
Another problem is that for multi-column cells XHTML says "this cell is 3 columns wide" (colspan="3") while CALS will say "this cell starts in column 2 and ends in column 4" (namest="2" nameend="4"). That transformation is done in the second template in the stylesheet.
The rest is indeed fairly straightforward. The stylesheet doesn't deal with details like changing style="width: 50%" into width="50%" etc. but those are relatively common problems, I believe.
I know it's 4 years later, but posting for someone who may come across:
ISOSTS XHTML table to CALS conversion
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With