An SVG
file is basically an XML
file so I could use the string <?xml
(or the hex representation: '3c 3f 78 6d 6c'
) as a magic number but there are a few opposing reason not to do that if for example there are extra white-spaces it could break this check.
The other images I need/expect to check are all binaries and have magic numbers. How can I fast check if the file is an SVG
format without using the extension eventually using Python?
Scalable Vector Graphics (SVG)
SVG allows three types of graphic objects: vector graphic shapes (such as paths consisting of straight lines and curves), bitmap images, and text. Graphical objects can be grouped, styled, transformed and composited into previously rendered objects.
SVG (Scalable Vector Graphic) is a graphics file format based on XML text. It means that the format relies on text to describe lines, curves, colors, and other physical attributes of an image.
XML is not required to start with the <?xml
preamble, so testing for that prefix is not a good detection technique — not to mention that it would identify every XML as SVG. A decent detection, and really easy to implement, is to use a real XML parser to test that the file is well-formed XML that contains the svg
top-level element:
import xml.etree.cElementTree as et
def is_svg(filename):
tag = None
with open(filename, "r") as f:
try:
for event, el in et.iterparse(f, ('start',)):
tag = el.tag
break
except et.ParseError:
pass
return tag == '{http://www.w3.org/2000/svg}svg'
Using cElementTree
ensures that the detection is efficient through the use of expat; timeit
shows that an SVG file was detected as such in ~200μs, and a non-SVG in 35μs. The iterparse
API enables the parser to forego creating the whole element tree (module name notwithstanding) and only read the initial portion of the document, regardless of total file size.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With