How does a parser (for example, HTML) work?

1 Answers

Tokenizing can be composed of a few steps, for example, if you have this html code:

<html>     <head>         <title>My HTML Page</title>     </head>     <body>         <p style="special">             This paragraph has special style         </p>         <p>             This paragraph is not special         </p>     </body> </html>

the tokenizer may convert that string to a flat list of significant tokens, ~~discarding whitespaces~~ (thanks, SasQ for the correction):

["<", "html", ">",       "<", "head", ">",           "<", "title", ">", "My HTML Page", "</", "title", ">",      "</", "head", ">",      "<", "body", ">",          "<", "p", "style", "=", "\"", "special", "\"", ">",             "This paragraph has special style",         "</", "p", ">",         "<", "p", ">",             "This paragraph is not special",         "</", "p", ">",     "</", "body", ">", "</", "html", ">" ]

there may be multiple tokenizing passes to convert a list of tokens to a list of even higher-level tokens like the following hypothetical HTML parser might do (which is still a flat list):

[("<html>", {}),       ("<head>", {}),           ("<title>", {}), "My HTML Page", "</title>",      "</head>",      ("<body>", {}),         ("<p>", {"style": "special"}),             "This paragraph has special style",         "</p>",         ("<p>", {}),             "This paragraph is not special",         "</p>",     "</body>", "</html>" ]

then the parser converts that list of tokens to form a tree or graph that represents the source text in a manner that is more convenient to access/manipulate by the program:

("<html>", {}, [     ("<head>", {}, [         ("<title>", {}, ["My HTML Page"]),     ]),      ("<body>", {}, [         ("<p>", {"style": "special"}, ["This paragraph has special style"]),         ("<p>", {}, ["This paragraph is not special"]),     ]), ])

at this point, the parsing is complete; and it is then up to the user to interpret the tree, modify it, etc.

163

answered Sep 19 '22 04:09

Lie Ryan

Related questions
                            
                                Autocomplete syntax for HTML or PHP in Notepad++. Not auto-close, autocompelete
                            
                                jQuery get content between <div> tags
                            
                                How to top, left justify text in a <td> cell that spans multiple rows
                            
                                How to set HTML content into an iframe
                            
                                How to make a HTML list appear horizontally instead of vertically using CSS only?
                            
                                Set keyboard focus to a <div>
                            
                                jquery get HTML 5 Data Attributes with hyphens and Case Sensitivity
                            
                                Making wide table fit bootstrap container
                            
                                Using margin / padding to space <span> from the rest of the <p>
                            
                                How do I hide the address bar on iPhone?
                            
                                Selecting a parent directory in html
                            
                                HTML Input - already filled in text
                            
                                How to specify model to a ngInclude directive in AngularJS?
                            
                                How to add a scrollbar to an HTML5 table?
                            
                                How to render html in select2 options
                            
                                Input Type image submit form value?
                            
                                Bootstrap 4 Dropdown Menu not working?
                            
                                .html() and .append() without jQuery
                            
                                Fieldset inside fieldset
                            
                                How to make a link act as a file input

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How does a parser (for example, HTML) work?

Tags:

html

browser

parsing

html-parsing

tokenize

alex

People also ask

1 Answers

Lie Ryan

Recent Activity

Donate For Us