I have a 200 pages site and would like to implement the canonicalization of links.
I use my ftp client to download the site into a local directory and would like to have the canonical meta tag right under the <head> tag for each page.
So, for page 1, i would like to transform
<head>
into
<head>
<link rel="canonical" href="http://www.site.com/page1.htm" />
and use sed to do it within the whole local directory (page1.htm, page2.htm... page200.htm). Thank you.
sed, awk are not designed to treat HTML. See RegEx match open tags except XHTML self-contained tags
cd /where/HTML_pages/exists
for file in *html; do xmlstarlet transform --html <(cat<<EOF
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0" >
<xsl:output method="html" encoding="utf-8"/>
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()" />
</xsl:copy>
</xsl:template>
<xsl:template match="head">
<xsl:copy>
<xsl:apply-templates/>
<xsl:if test="not(link)">
<link rel="canonical" href="http://www.site.com/$file" />
</xsl:if>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
EOF) >/"tmp/$file" "$file" && mv "/tmp/$file" "$file"
done
an even better/proper pure xslt solution still using xmlstarlet but now bash is no more mandatory :
file xsl.xslt :
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="html" encoding="utf-8" />
<!-- where are not making a HTML from scratch,
so we will copy what's exists -->
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()" />
</xsl:copy>
</xsl:template>
<!-- looking for "head" tag -->
<xsl:template match="head">
<xsl:copy>
<xsl:apply-templates />
<!-- if "link" tag not exists ... -->
<xsl:if test="not(link)">
<!-- we add the new "link" tag... -->
<link>
<xsl:attribute name="rel">
<!-- with a fixed string attribute... -->
<xsl:text>canonical</xsl:text>
</xsl:attribute>
<xsl:attribute name="href">
<!-- and a dynamic string attribute ("link" parameter) -->
<xsl:value-of select="$link" />
</xsl:attribute>
</link>
</xsl:if>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
shell code :
cd /where/HTML_pages/exists
for file in *html; do
xmlstarlet transform \
--html \
xsl.xslt \
-s "link=http://www.site.com/$file" "$file" > "/tmp/$file" &&
mv "/tmp/$file" "$file"
done
That will add the element you want in <head> with the current page as variable
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With