I am creating charts on the fly as SVGs using d3.js. These charts are dynamically generated based on the selections of authenticated users. Once these charts are generated, the user has the option to download the generated SVG as a PNG or PDF.
The current workflow is the following:
// JAVASC
// get the element containing generated SVG
var svg = document.getElementById("chart-container");
// Extract the data as SVG text string
var svg_xml = (new XMLSerializer).serializeToString(svg);
// Submit the <FORM> to the server.
var form = document.getElementById("svgform");
form['output_format'].value = output_format; // can be either "pdf" or "png"
form['data'].value = svg_xml ;
form.submit();
The FORM element is a hidden form, used to POST the data:
<form id="svgform" method="post" action="conversion.php">
<input type="hidden" id="output_format" name="output_format" value="">
<input type="hidden" id="data" name="data" value="">
</form>
The PHP file saves the provided SVG data as a temporary file:
// check for valid session, etc - omitted for brevity
$xmldat = $_POST['data']; // serialized XML representing the SVG element
if(simplexml_load_string($xmldat)===FALSE) { die; } // reject invalid XML
$fileformat = $_POST['output_format']; // chosen format for output; PNG or PDF
if ($fileformat != "pdf" && $fileformat != "png" ){ die; } // limited options for format
$fileformat = escapeshellarg($fileformat); // escape shell arguments that might have snuck in
// generate temporary file names with tempnam() - omitted for brevity
$handle = fopen($infile, "w");
fwrite($handle, $xmldat);
fclose($handle);
A conversion utility is run which reads the temporary file ($infile) and creates a new file ($outfile) in the specified $fileformat (PDF or PNG). The resulting new file is then returned to the browser, and the temporary files are deleted:
// headers etc generated - omitted for brevity
readfile($outfile);
unlink($infile); // delete temporary infile
unlink($outfile); // delete temporary outfile
I have investigated converting the SVG to a PNG using JavaScript (canvg(), then toDataURL, then document.write), and may use this for generating the PNGs, but it doesn't allow for conversion to PDF.
So: How can I best sanitize or filter the SVG data which is provided to conversion.php, before it's written to a file? What's the current state of SVG sanitization? What's available within PHP? Should I go with a whitelist-based approach to sanitizing the SVG data provided to conversion.php, or is there a better way?
(I do not know XSLT, though I could try to learn it; I hope to keep the sanitization within PHP as much as possible. Using Windows Server 2008, so any solutions that use external tools would need to be available within that ecosystem.)
You can use SVG Sanitize package: https://packagist.org/packages/enshrined/svg-sanitize
Has 500k installs on the date this answer is written.
use enshrined\svgSanitize\Sanitizer;
// Create a new sanitizer instance
$sanitizer = new Sanitizer();
// Load the dirty svg
$dirtySVG = file_get_contents('filthy.svg');
// Pass it to the sanitizer and get it back clean
$cleanSVG = $sanitizer->sanitize($dirtySVG);
// Now do what you want with your clean SVG/XML data
I am working with xml and PHP but I am not sure at all for your question. Please take it as an idea/suggestion, not more.
SimpleXML use libxml to load the xml content. http://www.php.net/manual/en/simplexml.requirements.php
You can disable the external entities using:
libxml_disable_entity_loader (TRUE)
http://www.php.net/manual/en/function.libxml-disable-entity-loader.php
before loading your file with simpleXML.
Then you could validate against SVG schema
http://us3.php.net/manual/en/domdocument.schemavalidate.php or http://us3.php.net/manual/en/domdocument.validate.php
The only concern I would see is that svg could contain script element. http://www.w3.org/TR/SVG/script.html#ScriptElement
There information on 1.1 DTD here: http://www.w3.org/Graphics/SVG/1.1/DTD/svg-framework.mod http://www.w3.org/TR/2003/REC-SVG11-20030114/REC-SVG11-20030114.pdf
You might provide a SVG DTD with a modified version of the script element or loop through elements to prevent the script element to be present.
It won't be perfect, but at least better than nothing.
You need to sanitize SVG using XML parser + whitelist.
Because SVG already has multiple ways to execute code and future extensions may add additional methods, you simply cannot blacklist "known dangerous" constructs. Whitelisting safe elements and attributes does work as long as you correctly handle all the XML corner cases (e.g. XSLT stylesheets, entity expansions, external entity references).
Example implementations: https://github.com/alnorris/SVG-Sanitizer/blob/master/SvgSanitizer.php (MIT license) or https://github.com/darylldoyle/svg-sanitizer (GPL v2 license)
More information about attack vectors that you have to consider while selecting which features you want to support:
<object>
allows JS from inside the SVG to execute in parent document)If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With