Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sanitizing an SVG document by whitelisting elements

I want to extract around 20 element types from some SVG documents to form a new SVG. rect, circle, polygon, text, polyline, basically a set of visual parts are in the white list. JavaScript, comments, animations and external links need to go.

Three methods come to mind:

  1. Regex: I'm completely familiar with, and would rather not go there obviously.
  2. PHP DOM: Used once perhaps a year ago.
  3. XSLT: Took my first look just now.

If XSLT is the right tool for the job, what xsl:stylesheet do I need? Otherwise, which approach would you use?

Example input:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!-- Created with Inkscape (http://www.inkscape.org/) -->
<svg xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:cc="http://creativecommons.org/ns#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:svg="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://www.w3.org/2000/svg" version="1.1" width="512" height="512" id="svg2">
<title>Mostly harmless</title>
  <metadata id="metadata7">Some metadata</metadata>

<script type="text/ecmascript">
<![CDATA[
alert('Hax!');
]]>
</script>
<style type="text/css">
<![CDATA[ svg{display:none} ]]>
</style>

  <defs id="defs4">
    <circle id="my_circle" cx="100" cy="50" r="40" fill="red"/> 
  </defs>

  <g id="layer1">
  <a xlink:href="www.hax.ru">
    <use xlink:href="#my_circle" x="20" y="20"/>
    <use xlink:href="#my_circle" x="100" y="50"/>
  </a>
  </g>
  <text>
    <tspan>It was the best of times</tspan>
    <tspan dx="-140" dy="15">It was the worst of times.</tspan>
  </text>
</svg>

Example output. Displays exactly the same image:

<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="512" height="512">
  <defs>
    <circle id="my_circle" cx="100" cy="50" r="40" fill="red"/> 
  </defs>
  <g id="layer1">
    <use xlink:href="#my_circle" x="20" y="20"/>
    <use xlink:href="#my_circle" x="100" y="50"/>
  </g>
  <text>
    <tspan>It was the best of times</tspan>
    <tspan dx="-140" dy="15">It was the worst of times.</tspan>
  </text>
</svg>

The approximate list of keeper elements is: g, rect, circle, ellipse, line, polyline, polygon, path, text, tspan, tref, textpath, linearGradient+stop, radialGradient, defs, clippath, path.

If not specifically SVG tiny, then certainly SVG lite.

like image 566
SamG Avatar asked Feb 02 '11 23:02

SamG


3 Answers

This transformation:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:s="http://www.w3.org/2000/svg"
 >
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="*">
  <xsl:element name="{name()}" namespace="{namespace-uri()}">
   <xsl:copy-of select="namespace::xlink"/>

   <xsl:apply-templates select="node()|@*"/>
  </xsl:element>
 </xsl:template>

 <xsl:template match="@*">
  <xsl:attribute name="{name()}"
                 namespace="{namespace-uri()}">
   <xsl:value-of select="."/>
  </xsl:attribute>
 </xsl:template>

 <xsl:template match="s:a">
  <xsl:apply-templates/>
 </xsl:template>

 <xsl:template match=
 "s:title|s:metadata|s:script|s:style|
  s:svg/@version|s:svg/@id"/>
</xsl:stylesheet>

when applied on the provided XML document:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!-- Created with Inkscape (http://www.inkscape.org/) -->
<svg xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:cc="http://creativecommons.org/ns#"
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
     xmlns:svg="http://www.w3.org/2000/svg"
     xmlns:xlink="http://www.w3.org/1999/xlink"
     xmlns="http://www.w3.org/2000/svg" version="1.1"
     width="512" height="512" id="svg2">
    <title>Mostly harmless</title>
    <metadata id="metadata7">Some metadata</metadata>
    <script type="text/ecmascript"><![CDATA[ alert('Hax!'); ]]></script>
    <style type="text/css"><![CDATA[ svg{display:none} ]]></style>
    <defs id="defs4">
        <circle id="my_circle" cx="100" cy="50" r="40" fill="red"/>
    </defs>
    <g id="layer1">
        <a xlink:href="www.hax.ru">
            <use xlink:href="#my_circle" x="20" y="20"/>
            <use xlink:href="#my_circle" x="100" y="50"/>
        </a>
    </g>
    <text>
        <tspan>It was the best of times</tspan>
        <tspan dx="-140" dy="15">It was the worst of times.</tspan>
    </text>
</svg>

produces the wanted, correct result:

<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="512" height="512">
   <defs id="defs4">
      <circle id="my_circle" cx="100" cy="50" r="40" fill="red"/>
   </defs>
   <g id="layer1">
      <use xlink:href="#my_circle" x="20" y="20"/>
      <use xlink:href="#my_circle" x="100" y="50"/>
   </g>
   <text>
      <tspan>It was the best of times</tspan>
      <tspan dx="-140" dy="15">It was the worst of times.</tspan>
   </text>
</svg>

Explanation:

  1. Two templates, having combined effect that is similar to the identity rule, match all "white-listed nodes and essentially copy them (only eliminating unwanted namespace nodes).

  2. A template with no body matches all "black-listed" nodes (elements and some attributes). These are effectively deleted.

  3. There must be templates that match specific "grey-listed" nodes (the template matching s:a in our case). A "grey-listed node will not be deleted completely -- it may be renamed or otherwize modified, or at least its contents may still be included in the output.

  4. It is likely that with your understanding of the problem becoming more and more clear, the three lists will continuously grow, so the match pattern for the black-list deleting template will be modified to accomodate the newly discovered black-listed elements. Newly-discovered white-listed nodes require no work at all. Only treating new grey-listed elements (if such are found at all) will require a little bit more work.

like image 189
Dimitre Novatchev Avatar answered Sep 18 '22 11:09

Dimitre Novatchev


svgfig is a good tool for this job. You can load SVG files and pick out parts you like to make a new document. Or you can just remove parts you don't like and re-save.

like image 27
Ben Jackson Avatar answered Sep 21 '22 11:09

Ben Jackson


Dimitre Novatchev's solution is more "clean" and elegant, but if you need a "whitelist" solution (because you can't predict what content users may input that you would need to "blacklist"), then you would need to fully flesh out the "whitelist".

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"
    xmlns:svg="http://www.w3.org/2000/svg">
    <xsl:output indent="yes" />

    <!--The "whitelist" template that will copy matched nodes forward and apply-templates
        for any attributes or child nodes -->
    <xsl:template match="svg:svg 
        | svg:defs  | svg:defs/text()
        | svg:g  | svg:g/text()
        | svg:a  | svg:a/text()
        | svg:use   | svg:use/text()
        | svg:rect  | svg:rect/text()
        | svg:circle  | svg:circle/text()
        | svg:ellipse  | svg:ellipse/text()
        | svg:line  | svg:line/text()
        | svg:polyline  | svg:polyline/text()
        | svg:polygon  | svg:polygon/text()
        | svg:path  | svg:path/text()
        | svg:text  | svg:text/text()
        | svg:tspan  | svg:tspan/text()
        | svg:tref  | svg:tref/text()
        | svg:textpath  | svg:textpath/text()
        | svg:linearGradient  | svg:linearGradient/text()
        | svg:radialGradient  | svg:radialGradient/text()
        | svg:clippath  | svg:clippath/text()
        | svg:text | svg:text/text()">
        <xsl:copy>
            <xsl:copy-of select="@*" />
            <xsl:apply-templates select="node()" />
        </xsl:copy>
    </xsl:template>

    <!--The "blacklist" template, which does nothing except apply templates for the 
        matched node's attributes and child nodes -->
    <xsl:template match="@* | node()">
        <xsl:apply-templates select="@* | node()" />
    </xsl:template>

</xsl:stylesheet>
like image 33
Mads Hansen Avatar answered Sep 20 '22 11:09

Mads Hansen