Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Do I encode ampersands in <a href...>?

Tags:

html

I'm writing code that automatically generates HTML, and I want it to encode things properly.

Say I'm generating a link to the following URL:

http://www.google.com/search?rls=en&q=stack+overflow 

I'm assuming that all attribute values should be HTML-encoded. (Please correct me if I'm wrong.) So that means if I'm putting the above URL into an anchor tag, I should encode the ampersand as &amp;, like this:

<a href="http://www.google.com/search?rls=en&amp;q=stack+overflow"> 

Is that correct?

like image 439
JW. Avatar asked Sep 14 '10 01:09

JW.


People also ask

Can you use an ampersand in a URL?

No. Unfortunately you can't use ampersands (&) as part of your domain name. Characters that you can use in your domain name include letters, numbers and hyphens.

How do I encode a URL in href?

URL encoding replaces unsafe ASCII characters with a "%" followed by two hexadecimal digits. URLs cannot contain spaces. URL encoding normally replaces a space with a plus (+) sign or with %20.

Why is space %20 in URL?

A space is assigned number 32, which is 20 in hexadecimal. When you see “%20,” it represents a space in an encoded URL, for example, http://www.example.com/products%20and%20services.html.


1 Answers

Yes, it is. HTML entities are parsed inside HTML attributes, and a stray & would create an ambiguity. That's why you should always write &amp; instead of just & inside all HTML attributes.

That said, only & and quotes need to be encoded. If you have special characters like é in your attribute, you don't need to encode those to satisfy the HTML parser.

It used to be the case that URLs needed special treatment with non-ASCII characters, like é. You had to encode those using percent-escapes, and in this case it would give %C3%A9, because they were defined by RFC 1738. However, RFC 1738 has been superseded by RFC 3986 (URIs, Uniform Resource Identifiers) and RFC 3987 (IRIs, Internationalized Resource Identifiers), on which the WhatWG based its work to define how browsers should behave when they see an URL with non-ASCII characters in it since HTML5. It's therefore now safe to include non-ASCII characters in URLs, percent-encoded or not.

like image 72
zneak Avatar answered Sep 20 '22 03:09

zneak