I'm trying to render some simple HTML documents (contain mostly div and br tags) to plain text, but I'm struggling on when to add new lines. I assumed it would be quite simple with <div>
and <br/>
generating new lines, but it looks like there's various subtle rules. For example:
<div>one line</div>
<div>two lines</div>
<hr/>
<div>one line</div>
<div></div>
<div>still two lines because the empty div doesn't count</div>
<hr/>
<div>one line<br/></div>
<div></div>
<div>still two lines because the br tag is ignored</div>
<hr/>
<div>one line<br/></div>
<div><br/></div>
<div>three lines this time because the second br tag is not ignored</div>
<hr/>
<div><div>Wrapped tags generate only one new line<br/></div></div>
<div><br/></div>
<div>three lines this time because the second br tag is not ignored</div>
So I'm looking for a specification on how new lines should be rendered in HTML documents (when no CSS is applied). Any idea where I could find this kind of document?
If you are looking for the specification for <div>
and <br>
,
you won't find it in one place, because each of them follow separate rules.
DIV elements follow the block formatting rules, while BR elements follow the text flow rules.
I believe that the cause of your confusion is the assumption that they follow the same new lines rule. Let me explain.
The BR element.
BR is defined in HTML4 Specification Section 9.3 regarding Lines and Paragraphs:
The BR element forcibly breaks (ends) the current line of text.
And in HTML5 Specification Section 4.5 regarding Text-level semantics:
The <br> element represents a line break.
The specification explains the result your third example:
<div>one line<br/></div>
<div></div>
<div>still two lines because the br tag is ignored</div>
There, the BR element is not ignored at all, because it marks that the line must be broken at that point. In other words, it marks the end of the current line of text. It is not about creating new lines.
In your fourth example:
<div>one line<br/></div>
<div><br/></div>
<div>three lines this time because the second br tag is not ignored</div>
the BR elements also marks the end of the line. Because the line has zero characters, it is rendered as an empty line.
Therefore, the rule is the same in your third and fourth example. Nothing is ignored.
The DIV element.
In the absence of explicit style sheet, the default style applies. A DIV element is by default a block-level element which means it follows the block formatting context defined in CSS Specification Section 9.4.1:
In a block formatting context, boxes are laid out one after the other, vertically, beginning at the top of a containing block.
Therefore, this is also not about creating new lines because in a block formatting context, there is no notion of lines. It is about placing block elements one after another from top to bottom.
In your second example:
<div>one line</div>
<div></div>
<div>still two lines because the empty div doesn't count</div>
the empty DIV has zero height, therefore it has no effect on the rendering of the next block-level element.
In your fifth example:
<div><div>Wrapped tags generate only one new line<br/></div></div>
<div><br/></div>
<div>three lines this time because the second br tag is not ignored</div>
the outer DIV functions as a containing block as defined in Section 9.1.2 and the inner DIV is defined Section 9.4.1 that I have quoted above. Because no CSS is applied, a DIV element by default has zero margin and zero padding, which makes every edge of the inner DIV touches the corresponding edges the outer DIV. In other words, the inner DIV is rendered at exactly the same place as the outer DIV.
I believe that's everything.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With