Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Loading DOM from HTML: How does a HTML parser know when empty element ends

Tags:

In XML, empty elements has a corresponding tag marked with />. But that is not present in HTML. So do a HTML parser have a finite list of elements that can be empty. What if such an element has an end tag?

like image 628
user877329 Avatar asked Jul 31 '15 08:07

user877329


People also ask

How does HTML parser work?

HTML parsing involves tokenization and tree construction. HTML tokens include start and end tags, as well as attribute names and values. If the document is well-formed, parsing it is straightforward and faster. The parser parses tokenized input into the document, building up the document tree.

What will happen to DOM tree if some issue happens in script tag?

The browser reads the html file from top to bottom, creating the DOM Tree and when it sees a <script> , it stops to download and execute it until the parse goes through the whole page.

How do you parse an element in HTML?

If you just want to parse HTML and your HTML is intended for the body of your document, you could do the following : (1) var div=document. createElement("DIV"); (2) div. innerHTML = markup; (3) result = div. childNodes; --- This gives you a collection of childnodes and should work not just in IE8 but even in IE6-7.

Is it valid to use empty elements in HTML?

No, there is no such terms as Empty Element. Empty elements are element with no data. O No, it is not valid to use Empty Element.


1 Answers

There are tags in html which have a closing tag and one's which don't have one.And it's more confusing after the introduction of HTML5. After a lot of research Here's what i found so far. I hope you'll understand :)

do a HTML parser have a finite list of elements that can be empty.

Answer : Yes, HTML parsers have finite list of empty elements. The parser have certain rules for parsing and will ignore the empty tags.

These are the elements that can be empty. (Source :- Mozilla documentation

<link>
<track>
<param>
<area>
<command>
<col>
<base>
<meta>
<hr>
<source>
<img>
<keygen>
<br>
<wbr>
<colgroup> when the span is present
<input>

In HTML, using a closing tag on an empty element is invalid. For example, <input type="text"></input> is invalid HTML and the parser will ignore those tags.

"Empty elements (void elements)" were introduced to HTML by mistake: presentational markup crept into the language, contrary to the spirit of SGML, and with some strange syntactic implications. This fundamental error has caused some technical problems like an unintended discrepancy between HTML and XHTML, causing surprises in validation. More importantly, it illustrates the implications of the decision to make HTML formally, and only formally, an "SGML application". "Empty elements" are more than they look like.

Source (worth reading) : cs.tut.fi empty elements research paper )

What if such an element has an end tag?

The parser will ignore the element which has an end tag and it will consume the next element or character for parsing. And the parser will throw an ignored syntax error

Read This w3c article, It's about HTML empty elements (void elements) W3C Link

Article about empty elements by 456bereastreet

Color glare article on empty elements Colorglare link

like image 95
Jijo John Avatar answered Oct 01 '22 01:10

Jijo John