Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SVG parsing and data type

I'm writing an SVG parser, mainly as an exercise for learning how to use Parsec. Currently I'm using the following data type to represent my SVG file:

data SVG = Element String [Attribute] [SVG]
         | SelfClosingTag [Attribute]
         | Body String
         | Comment String
         | XMLDecl String

This works quite well, however I'm not sure about the Element String [Attribute] [SVG] part of my data type. Since there is only a limited number of potential tags for an SVG, I was thinking about using a type to represent an SVG element instead of using a String. Something like this:

data SVG = Element TagName [Attribute] [SVG]
         | ...

data TagName = A
             | AltGlyph
             | AltGlyphDef
             ...
             | View
             | Vkern

Is it a good idea? What would be the benefits of doing this if there are any? Is there a more elegant solution?

like image 869
Elie Génard Avatar asked Jan 18 '16 17:01

Elie Génard


2 Answers

I personally prefer the approach of enumerating all possible TagNames. This way, the compiler can give you errors and warnings if you make any careless mistakes. For example, if I want to write a function that covers every possible type of Element, then if every type is enumerated in an ADT, the compiler can give you non-exhaustive match warnings. If you represent it as a string, this is not possible. Additionally, if I want to match an Element of a specific type, and I accidentally misspell the TagName, the compiler will catch it. A third reason, which probably doesn't really apply here, but is worth noting in general is that if I later decide to add or remove a variant of TagName, then the compiler will tell me every place that needs to be modified. I doubt this will happen for SVG tag names, but in general it is something to keep in mind.

like image 76
Matt Avatar answered Oct 09 '22 10:10

Matt


To answer your question:

You can do this either way depending on what you are going to do with your parse tree after you make it.

If all you care to do with you SVG parser is describe the shape of the SGV data, you are just fin with a string.

On the other hand if you want to somehow transform that SVG data into something like a graphic (that is you anticipate evaluating your AST) you will find that it is best to represent all semantic information in the type system. It will make the next steps much easier.

The question in my mind is whether the parsing pass is exactly the place to make that happen. (Full disclosure, I have only a passing familiarity with SVG.) I suspect that rather then just a flat list of tags, you would be better off with Element each with it's own set of required and optional attributes. if this transformation "happens later in the program" there is no need to create a TagName data type. You can catch all the type errors at the same time you merge the attributes into the Elements.

On the other hand, a good argument could be made to parse straight into a complete Element tree in which case, I would drop the generic [Attribute] and [SVG] fields of the Element constructor and instead make appropriate fields in your TagName constructor.


Another answer to the question you didn't ask:

Put source code location into your parse tree early. From personal experence, I can tell you it gets harder the larger your program gets.

like image 4
John F. Miller Avatar answered Oct 09 '22 11:10

John F. Miller