I am revising some XHTML files authored by another party. As part of this effort, I am doing some bulk editing via Linq to XML. I've just noticed that some of the original source XHTML files contain the <code>&quot;</code> HTML entity in text nodes within those files. For instance: <pre class="prettyprint lang-html prettyprint-override"><code>Greeting: &quot;Hello, World!&quot; </code></pre> And that when recovering the XHTML text via XElement.ToString(), the <code>&quot;</code> entities are being replaced by plain double-quotes: <pre class="prettyprint lang-html prettyprint-override"><code>Greeting: "Hello, World!" </code></pre> Question: Can anyone tell me what the motivation might have been for the original author to use the <code>&quot;</code> entities instead of plain double-quotes? Did those entities serve a purpose which I don't fully appreciate? Or, were they truly unnecessary as I suspect? I do understand that <code>&quot;</code> would be necessary in certain contexts, such as when there is a need to place a double-quote within an HTML attribute. For instance: <pre class="prettyprint lang-html prettyprint-override"><code><a href="/images/hello_world.jpg" alt="Greeting: &quot;Hello, World!&quot;"> Greeting</a> </code></pre>

It is impossible, and unnecessary, to know the motivation for using <code>&quot;</code> in element content, but possible motives include: misunderstanding of HTML rules; use of software that generates such code (probably because its author thought it was “safer”); and misunderstanding of the meaning of <code>&quot;</code>: many people seem to think it produces “smart quotes” (they apparently never looked at the actual results). Anyway, there is never any need to use <code>&quot;</code> in element content in HTML (XHTML or any other HTML version). There is nothing in any HTML specification that would assign any special meaning to the plain character " there. As the question says, it has its role in attribute values, but even in them, it is mostly simpler to just use single quotes as delimiters if the value contains a double quote, e.g. <code>alt='Greeting: "Hello, World!"'</code> or, if you are allowed to correct errors in natural language texts, to use proper quotation marks, e.g. <code>alt="Greeting: “Hello, World!”"</code>

Uses for the '"' entity in HTML

Tags:

html

escaping

xhtml

html-entities

linq-to-xml

I am revising some XHTML files authored by another party. As part of this effort, I am doing some bulk editing via Linq to XML.

I've just noticed that some of the original source XHTML files contain the " HTML entity in text nodes within those files. For instance:

<p>Greeting: &quot;Hello, World!&quot;</p>

And that when recovering the XHTML text via XElement.ToString(), the " entities are being replaced by plain double-quotes:

<p>Greeting: "Hello, World!"</p>

Question: Can anyone tell me what the motivation might have been for the original author to use the " entities instead of plain double-quotes? Did those entities serve a purpose which I don't fully appreciate? Or, were they truly unnecessary as I suspect?

I do understand that " would be necessary in certain contexts, such as when there is a need to place a double-quote within an HTML attribute. For instance:

<a href="/images/hello_world.jpg" alt="Greeting: &quot;Hello, World!&quot;">
  Greeting</a>

995

asked Sep 18 '14 15:09

DavidRR

3 Answers

It is impossible, and unnecessary, to know the motivation for using " in element content, but possible motives include: misunderstanding of HTML rules; use of software that generates such code (probably because its author thought it was “safer”); and misunderstanding of the meaning of ": many people seem to think it produces “smart quotes” (they apparently never looked at the actual results).

Anyway, there is never any need to use " in element content in HTML (XHTML or any other HTML version). There is nothing in any HTML specification that would assign any special meaning to the plain character " there.

As the question says, it has its role in attribute values, but even in them, it is mostly simpler to just use single quotes as delimiters if the value contains a double quote, e.g. alt='Greeting: "Hello, World!"' or, if you are allowed to correct errors in natural language texts, to use proper quotation marks, e.g. alt="Greeting: “Hello, World!”"

answered Oct 20 '22 18:10

Jukka K. Korpela

Reason #1

There was a point where buggy/lazy implementations of HTML/XHTML renderers were more common than those that got it right. Many years ago, I regularly encountered rendering problems in mainstream browsers resulting from the use of unencoded quote chars in regular text content of HTML/XHTML documents. Though the HTML spec has never disallowed use of these chars in text content, it became fairly standard practice to encode them anyway, so that non-spec-compliant browsers and other processors would handle them more gracefully. As a result, many "old-timers" may still do this reflexively. It is not incorrect, though it is now probably unnecessary, unless you're targeting some very archaic platforms.

Reason #2

When HTML content is generated dynamically, for example, by populating an HTML template with simple string values from a database, it's necessary to encode each value before embedding it in the generated content. Some common server-side languages provided a single function for this purpose, which simply encoded all chars that might be invalid in some context within an HTML document. Notably, PHP's htmlspecialchars() function is one such example. Though there are optional arguments to htmlspecialchars() that will cause it to ignore quotes, those arguments were (and are) rarely used by authors of basic template-driven systems. The result is that all "special chars" are encoded everywhere they occur in the generated HTML, without regard for the context in which they occur. Again, this is not incorrect, it's simply unnecessary.

answered Oct 20 '22 18:10

Lee

In my experience it may be the result of auto-generation by a string-based tools, where the author did not understand the rules of HTML.

When some developers generate HTML without the use of special XML-oriented tools, they may try to be sure the resulting HTML is valid by taking the approach that everything must be escaped.

Referring to your example, the reason why every occurrence of " is represented by " could be because using that approach, you can safely use such "special" characters in both attributes and values.

Another motivation I've seen is where people believe, "We must explicitly show that our symbols are not part of the syntax." Whereas, valid HTML can be created by using the proper string-manipulation tools, see the previous paragraph again.

Here is some pseudo-code loosely based on C#, although it is preferred to use valid methods and tools:

public class HtmlAndXmlWriter
{
    private string Escape(string badString)
    {
        return badString.Replace("&", "&amp;").Replace("\"", "&quot;").Replace("'", "&apos;").Replace(">", "&gt;").Replace("<", "&lt;");

    }

    public string GetHtmlFromOutObject(Object obj)
    {
        return "<div class='type_" + Escape(obj.Type) + "'>" + Escape(obj.Value) + "</div>";    

    }

}

It's really very common to see such approaches taken to generate HTML.

answered Oct 20 '22 18:10

comdiv

Related questions
                            
                                CSS side by side div's auto equal widths
                            
                                CSS how to make scrollable list
                            
                                HTML Input Box - Disable
                            
                                How to edit the size of the submit button on a form?
                            
                                How to automatically allow blocked content in IE?
                            
                                How to create string with multiple spaces in JavaScript
                            
                                Changing the space between each item in Bootstrap navbar
                            
                                How to execute code after html form reset with jquery?
                            
                                What is the Indentation standard for HTML (Tab / Two spaces / etc)? [closed]
                            
                                overflow-x:hidden still can scroll
                            
                                CSS: max-width for @media query not working
                            
                                Remember and Repopulate File Input [duplicate]
                            
                                Text not wrapping in p tag
                            
                                Prepending "http://" to a URL that doesn't already contain "http://"
                            
                                How does Youtube's HTML5 video player control buffering?
                            
                                How to format/tidy/beautify in JavaScript
                            
                                VueJS/browser caching production builds
                            
                                HTML: Best tag for 'labels' outside of forms
                            
                                HTML5 Drag and Drop - No transparency?
                            
                                Add CSS3 transition expand/collapse

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Uses for the '"' entity in HTML

Tags:

html

escaping

xhtml

html-entities

linq-to-xml

DavidRR

People also ask

3 Answers

Jukka K. Korpela

Lee

comdiv

Recent Activity

Donate For Us

Uses for the '&quot;' entity in HTML

Tags:

html

escaping

xhtml

html-entities

linq-to-xml

DavidRR

People also ask

3 Answers

Jukka K. Korpela

Lee

comdiv

Related questions

Recent Activity

Donate For Us

Uses for the '"' entity in HTML