I want to extract around 20 element types from some SVG documents to form a new SVG.
rect
, circle
, polygon
, text
, polyline
, basically a set of visual parts are in the white list.
JavaScript, comments, animations and external links need to go.
Three methods come to mind:
If XSLT is the right tool for the job, what xsl:stylesheet do I need? Otherwise, which approach would you use?
Example input:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!-- Created with Inkscape (http://www.inkscape.org/) -->
<svg xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:cc="http://creativecommons.org/ns#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:svg="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://www.w3.org/2000/svg" version="1.1" width="512" height="512" id="svg2">
<title>Mostly harmless</title>
<metadata id="metadata7">Some metadata</metadata>
<script type="text/ecmascript">
<![CDATA[
alert('Hax!');
]]>
</script>
<style type="text/css">
<![CDATA[ svg{display:none} ]]>
</style>
<defs id="defs4">
<circle id="my_circle" cx="100" cy="50" r="40" fill="red"/>
</defs>
<g id="layer1">
<a xlink:href="www.hax.ru">
<use xlink:href="#my_circle" x="20" y="20"/>
<use xlink:href="#my_circle" x="100" y="50"/>
</a>
</g>
<text>
<tspan>It was the best of times</tspan>
<tspan dx="-140" dy="15">It was the worst of times.</tspan>
</text>
</svg>
Example output. Displays exactly the same image:
<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="512" height="512">
<defs>
<circle id="my_circle" cx="100" cy="50" r="40" fill="red"/>
</defs>
<g id="layer1">
<use xlink:href="#my_circle" x="20" y="20"/>
<use xlink:href="#my_circle" x="100" y="50"/>
</g>
<text>
<tspan>It was the best of times</tspan>
<tspan dx="-140" dy="15">It was the worst of times.</tspan>
</text>
</svg>
The approximate list of keeper elements is: g, rect, circle, ellipse, line, polyline, polygon, path, text, tspan, tref, textpath, linearGradient+stop, radialGradient, defs, clippath, path
.
If not specifically SVG tiny, then certainly SVG lite.
This transformation:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:s="http://www.w3.org/2000/svg"
>
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="*">
<xsl:element name="{name()}" namespace="{namespace-uri()}">
<xsl:copy-of select="namespace::xlink"/>
<xsl:apply-templates select="node()|@*"/>
</xsl:element>
</xsl:template>
<xsl:template match="@*">
<xsl:attribute name="{name()}"
namespace="{namespace-uri()}">
<xsl:value-of select="."/>
</xsl:attribute>
</xsl:template>
<xsl:template match="s:a">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match=
"s:title|s:metadata|s:script|s:style|
s:svg/@version|s:svg/@id"/>
</xsl:stylesheet>
when applied on the provided XML document:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!-- Created with Inkscape (http://www.inkscape.org/) -->
<svg xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:cc="http://creativecommons.org/ns#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:svg="http://www.w3.org/2000/svg"
xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns="http://www.w3.org/2000/svg" version="1.1"
width="512" height="512" id="svg2">
<title>Mostly harmless</title>
<metadata id="metadata7">Some metadata</metadata>
<script type="text/ecmascript"><![CDATA[ alert('Hax!'); ]]></script>
<style type="text/css"><![CDATA[ svg{display:none} ]]></style>
<defs id="defs4">
<circle id="my_circle" cx="100" cy="50" r="40" fill="red"/>
</defs>
<g id="layer1">
<a xlink:href="www.hax.ru">
<use xlink:href="#my_circle" x="20" y="20"/>
<use xlink:href="#my_circle" x="100" y="50"/>
</a>
</g>
<text>
<tspan>It was the best of times</tspan>
<tspan dx="-140" dy="15">It was the worst of times.</tspan>
</text>
</svg>
produces the wanted, correct result:
<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="512" height="512">
<defs id="defs4">
<circle id="my_circle" cx="100" cy="50" r="40" fill="red"/>
</defs>
<g id="layer1">
<use xlink:href="#my_circle" x="20" y="20"/>
<use xlink:href="#my_circle" x="100" y="50"/>
</g>
<text>
<tspan>It was the best of times</tspan>
<tspan dx="-140" dy="15">It was the worst of times.</tspan>
</text>
</svg>
Explanation:
Two templates, having combined effect that is similar to the identity rule, match all "white-listed nodes and essentially copy them (only eliminating unwanted namespace nodes).
A template with no body matches all "black-listed" nodes (elements and some attributes). These are effectively deleted.
There must be templates that match specific "grey-listed" nodes (the template matching s:a
in our case). A "grey-listed node will not be deleted completely -- it may be renamed or otherwize modified, or at least its contents may still be included in the output.
It is likely that with your understanding of the problem becoming more and more clear, the three lists will continuously grow, so the match pattern for the black-list deleting template will be modified to accomodate the newly discovered black-listed elements. Newly-discovered white-listed nodes require no work at all. Only treating new grey-listed elements (if such are found at all) will require a little bit more work.
svgfig is a good tool for this job. You can load SVG files and pick out parts you like to make a new document. Or you can just remove parts you don't like and re-save.
Dimitre Novatchev's solution is more "clean" and elegant, but if you need a "whitelist" solution (because you can't predict what content users may input that you would need to "blacklist"), then you would need to fully flesh out the "whitelist".
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"
xmlns:svg="http://www.w3.org/2000/svg">
<xsl:output indent="yes" />
<!--The "whitelist" template that will copy matched nodes forward and apply-templates
for any attributes or child nodes -->
<xsl:template match="svg:svg
| svg:defs | svg:defs/text()
| svg:g | svg:g/text()
| svg:a | svg:a/text()
| svg:use | svg:use/text()
| svg:rect | svg:rect/text()
| svg:circle | svg:circle/text()
| svg:ellipse | svg:ellipse/text()
| svg:line | svg:line/text()
| svg:polyline | svg:polyline/text()
| svg:polygon | svg:polygon/text()
| svg:path | svg:path/text()
| svg:text | svg:text/text()
| svg:tspan | svg:tspan/text()
| svg:tref | svg:tref/text()
| svg:textpath | svg:textpath/text()
| svg:linearGradient | svg:linearGradient/text()
| svg:radialGradient | svg:radialGradient/text()
| svg:clippath | svg:clippath/text()
| svg:text | svg:text/text()">
<xsl:copy>
<xsl:copy-of select="@*" />
<xsl:apply-templates select="node()" />
</xsl:copy>
</xsl:template>
<!--The "blacklist" template, which does nothing except apply templates for the
matched node's attributes and child nodes -->
<xsl:template match="@* | node()">
<xsl:apply-templates select="@* | node()" />
</xsl:template>
</xsl:stylesheet>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With