Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sanitizing SVG using PHP

I am creating charts on the fly as SVGs using d3.js. These charts are dynamically generated based on the selections of authenticated users. Once these charts are generated, the user has the option to download the generated SVG as a PNG or PDF.

The current workflow is the following:

// JAVASC
// get the element containing generated SVG
var svg = document.getElementById("chart-container");

// Extract the data as SVG text string
var svg_xml = (new XMLSerializer).serializeToString(svg);

// Submit the <FORM> to the server.
var form = document.getElementById("svgform");
form['output_format'].value = output_format;  // can be either "pdf" or "png"
form['data'].value = svg_xml ;
form.submit();

The FORM element is a hidden form, used to POST the data:

<form id="svgform" method="post" action="conversion.php">
  <input type="hidden" id="output_format" name="output_format" value="">
  <input type="hidden" id="data" name="data" value="">
</form>

The PHP file saves the provided SVG data as a temporary file:

// check for valid session, etc - omitted for brevity 

$xmldat = $_POST['data'];  // serialized XML representing the SVG element
if(simplexml_load_string($xmldat)===FALSE) { die; } // reject invalid XML  

$fileformat = $_POST['output_format'];  // chosen format for output;  PNG or PDF
if ($fileformat != "pdf" && $fileformat != "png" ){ die; } // limited options for format
$fileformat = escapeshellarg($fileformat); // escape shell arguments that might have snuck in

// generate temporary file names with tempnam() - omitted for brevity

$handle = fopen($infile, "w");
fwrite($handle, $xmldat);
fclose($handle);

A conversion utility is run which reads the temporary file ($infile) and creates a new file ($outfile) in the specified $fileformat (PDF or PNG). The resulting new file is then returned to the browser, and the temporary files are deleted:

// headers etc generated - omitted for brevity
readfile($outfile);

unlink($infile);  // delete temporary infile  
unlink($outfile);  // delete temporary outfile  

I have investigated converting the SVG to a PNG using JavaScript (canvg(), then toDataURL, then document.write), and may use this for generating the PNGs, but it doesn't allow for conversion to PDF.

So: How can I best sanitize or filter the SVG data which is provided to conversion.php, before it's written to a file? What's the current state of SVG sanitization? What's available within PHP? Should I go with a whitelist-based approach to sanitizing the SVG data provided to conversion.php, or is there a better way?

(I do not know XSLT, though I could try to learn it; I hope to keep the sanitization within PHP as much as possible. Using Windows Server 2008, so any solutions that use external tools would need to be available within that ecosystem.)

like image 760
Ale Exc Avatar asked Dec 20 '12 16:12

Ale Exc


3 Answers

You can use SVG Sanitize package: https://packagist.org/packages/enshrined/svg-sanitize

Has 500k installs on the date this answer is written.

use enshrined\svgSanitize\Sanitizer;

// Create a new sanitizer instance
$sanitizer = new Sanitizer();

// Load the dirty svg
$dirtySVG = file_get_contents('filthy.svg');

// Pass it to the sanitizer and get it back clean
$cleanSVG = $sanitizer->sanitize($dirtySVG);

// Now do what you want with your clean SVG/XML data
like image 71
Lucas Bustamante Avatar answered Nov 12 '22 18:11

Lucas Bustamante


I am working with xml and PHP but I am not sure at all for your question. Please take it as an idea/suggestion, not more.

SimpleXML use libxml to load the xml content. http://www.php.net/manual/en/simplexml.requirements.php

You can disable the external entities using:

libxml_disable_entity_loader (TRUE)

http://www.php.net/manual/en/function.libxml-disable-entity-loader.php

before loading your file with simpleXML.

Then you could validate against SVG schema

http://us3.php.net/manual/en/domdocument.schemavalidate.php or http://us3.php.net/manual/en/domdocument.validate.php

The only concern I would see is that svg could contain script element. http://www.w3.org/TR/SVG/script.html#ScriptElement

There information on 1.1 DTD here: http://www.w3.org/Graphics/SVG/1.1/DTD/svg-framework.mod http://www.w3.org/TR/2003/REC-SVG11-20030114/REC-SVG11-20030114.pdf

You might provide a SVG DTD with a modified version of the script element or loop through elements to prevent the script element to be present.

It won't be perfect, but at least better than nothing.

like image 34
Bertrand Avatar answered Nov 12 '22 20:11

Bertrand


You need to sanitize SVG using XML parser + whitelist.

Because SVG already has multiple ways to execute code and future extensions may add additional methods, you simply cannot blacklist "known dangerous" constructs. Whitelisting safe elements and attributes does work as long as you correctly handle all the XML corner cases (e.g. XSLT stylesheets, entity expansions, external entity references).

Example implementations: https://github.com/alnorris/SVG-Sanitizer/blob/master/SvgSanitizer.php (MIT license) or https://github.com/darylldoyle/svg-sanitizer (GPL v2 license)

More information about attack vectors that you have to consider while selecting which features you want to support:

  • https://phabricator.wikimedia.org/T85850 (base64 encoded parts)
  • https://www.slideshare.net/x00mario/the-image-that-called-me (different ways to execute code)
  • https://www.blackhat.com/docs/us-14/materials/us-14-DeGraaf-SVG-Exploiting-Browsers-Without-Image-Parsing-Bugs.pdf (embedding HTML inside SVG, SVG can do pretty much anything any XML file can do and any HTML file can do, using SVG inside <object> allows JS from inside the SVG to execute in parent document)
  • https://bjornjohansen.no/svg-in-wordpress (filtering SVG is hard enough that even WordPress still does not have a good solution for user submitted SVG files)
  • http://html5sec.org/?svg (list of some known SVG attacks by misusing different APIs)
  • https://security.stackexchange.com/questions/26264
  • https://blobfolio.com/2017/03/when-a-stranger-calls-sanitizing-svgs/ (different ways to encode stuff, clever use of whitespace to avoid detection, xml tricks)
like image 32
Mikko Rantalainen Avatar answered Nov 12 '22 19:11

Mikko Rantalainen