How to create/write a simple XML parser from scratch? Rather than code samples, I want to know what are the simplified, basic steps in English. How is a good parser designed? I understand that regex should not be used in a parser, but how much is regex's role in parsing XML? What is the recommended data structure to use? Should I use linked lists to store and retrieve nodes, attributes, and values? I want to learn how to create an XML parser so that I can write one in D programming language.

for and event based parser the user need to pass it some functions (<code>startNode(name,attrs)</code>, <code>endNode(name)</code> and <code>someText(txt)</code> likely through an interface) and call them when needed as you pass over the file the parser will have a while loop that will alternate between reading until <code><</code> and until <code>></code> and do the proper conversions to the parameter types <pre class="prettyprint lang-d prettyprint-override"><code>void parse(EventParser p, File file){ string str; while((str = file.readln('<')).length !=0){ //not using a rewritable buffer to take advantage of slicing //but it's a quick conversion to a implementation with a rewritable buffer though if(str.length>1)p.someText(str.chomp('<')); str = file.readln('>'); str = str.chomp('>'); //split str in name and attrs auto parts = str.split(); string name = parts[0]; string[string] attrs; foreach(attribute;parts[1..$]){ auto splitAtrr = attribute.split("="); attrs[splitAtrr[0]] = splitAtrr[1]; } if(str[0] == '/')p.endNode(name); else { p.startNode(name,attrs); if(str[str.length-1]=='/')p.endNode(name);//self closing tag } } } </code></pre> <hr> you can build a DOM parser on top of a event based parser and the basic functionality you'll need for each node is getChildren and getParent getName and getAttributes (with setters when building ;) ) the object for the dom parser with the above described methods: <pre class="prettyprint lang-d prettyprint-override"><code>class DOMEventParser : EventParser{ DOMNode current = new RootNode(); overrides void startNode(string name,string[string] attrs){ DOMNode tmp = new ElementNode(current,name,attrs); current.appendChild(tmp); current = tmp; } overrides void endNode(string name){ asser(name == current.name); current = current.parent; } overrides void someText(string txt){ current.appendChild(new TextNode(txt)); } } </code></pre> when the parsing ends the rootnode will have the root of the DOM tree note: I didn't put any verification code in there to ensure correctness of the xml edit: the parsing of the attributes has a bug in it, instead of splitting on whitespace a regex is better for that

How to create/write a simple XML parser from scratch?

2 Answers

If you don't know how to write a parser, then you need to do some reading. Get hold of any book on compiler-writing (many of the best ones were written 30 or 40 years ago, e.g. Aho and Ullmann) and study the chapters on lexical analysis and syntax analysis. XML is essentially no different, except that the lexical and grammar phases are not as clearly isolated from each other as in some languages.

One word of warning, if you want to write a fully-conformant XML parser then 90% of your effort will be spent getting edge cases right in obscure corners of the spec dealing with things such as parameter entities that most XML users aren't even aware of.

181

answered Sep 18 '22 17:09

Michael Kay

for and event based parser the user need to pass it some functions (startNode(name,attrs), endNode(name) and someText(txt) likely through an interface) and call them when needed as you pass over the file

the parser will have a while loop that will alternate between reading until < and until > and do the proper conversions to the parameter types

void parse(EventParser p, File file){     string str;     while((str = file.readln('<')).length !=0){         //not using a rewritable buffer to take advantage of slicing          //but it's a quick conversion to a implementation with a rewritable buffer though         if(str.length>1)p.someText(str.chomp('<'));           str = file.readln('>');         str = str.chomp('>');          //split str in name and attrs         auto parts = str.split();         string name = parts[0];         string[string] attrs;         foreach(attribute;parts[1..$]){             auto splitAtrr = attribute.split("=");             attrs[splitAtrr[0]] = splitAtrr[1];         }          if(str[0] == '/')p.endNode(name);         else {             p.startNode(name,attrs);             if(str[str.length-1]=='/')p.endNode(name);//self closing tag         }     } }

you can build a DOM parser on top of a event based parser and the basic functionality you'll need for each node is getChildren and getParent getName and getAttributes (with setters when building ;) )

the object for the dom parser with the above described methods:

class DOMEventParser : EventParser{     DOMNode current = new RootNode();     overrides void startNode(string name,string[string] attrs){         DOMNode tmp = new ElementNode(current,name,attrs);         current.appendChild(tmp);         current = tmp;     }     overrides void endNode(string name){         asser(name == current.name);         current = current.parent;     }     overrides void someText(string txt){         current.appendChild(new TextNode(txt));     } }

when the parsing ends the rootnode will have the root of the DOM tree

note: I didn't put any verification code in there to ensure correctness of the xml

edit: the parsing of the attributes has a bug in it, instead of splitting on whitespace a regex is better for that

answered Sep 22 '22 17:09

ratchet freak

Related questions
                            
                                How to check if an attribute exists in a XML file using XSL
                            
                                How to transform XML as a string w/o using files in .NET?
                            
                                No resource identifier found for attribute 'roundIcon' in package 'android'
                            
                                xml error: Non white space characters cannot be added to content
                            
                                toggle visibility of chain group in constraint layout
                            
                                how to select attribute value of a node in XQuery?
                            
                                java.net.URISyntaxException
                            
                                c# create xml from byte array
                            
                                how to deserialize an xml node with a value and an attribute using asp.net serialization
                            
                                Ripple effect on shape drawable
                            
                                How to display &nbsp; in XML output
                            
                                Concept XML XLST preceding-sibling and ancestor
                            
                                Using XSLT to copy all nodes in XML, with support for special cases
                            
                                XSLT: How to represent OR in a "match" attribute?
                            
                                Creating a 3D flip animation in Android using XML
                            
                                How to change First letter of each word to Uppercase in Textview xml
                            
                                SimpleXMLElement to PHP Array [duplicate]
                            
                                Android -purpose of useLevel in shape tag
                            
                                How to post SOAP Request from .NET?
                            
                                Using Oracle XMLType column in hibernate

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to create/write a simple XML parser from scratch?

Tags:

xml

xml-parsing

d

XP1

People also ask

2 Answers

Michael Kay

ratchet freak

Recent Activity

Donate For Us