Why doesn't encodeURIComponent encode single quotes/apostrophes?

The escape() function, was deprecated and replaced by encodeURIComponent but encodeURIComponent doesn't encode single quote/apostrophe character. Which I need to escape the apostrophes in a persons surname (E.g. 'O'Neill') in an AJAX form. Why would they remove the ability of something they were trying to improve?


So here is a code example to explain the problem more thoroughly. So as you can see the surname 'O'Neill' contains an apostrophe that needs to be escaped when passing the variable in the url. But this would also happen in other places in the form, for instance if an address entered was 'Billy's Tavern'.

<input id='surname' value="O'Neill">                         <script> var get_url = '?surname='+encodeURIComponent($('#surname').val()); $.ajax({     url: get_url }); </script> 

My current solution, using a custom function. My question was just to ask why there is a need for a custom function.

<script> function customEncodeURIComponent(URI) {     return encodeURIComponent(URI).replace(/'/g, "%27"); } </script>  <input id='surname' value="O'Neill"> <script> var get_url = '?surname='+customEncodeURIComponent($('#surname').val()); $.ajax({     url: get_url }); </script> 
2 Answers

encodeURIComponent escapes all characters except the following:

alphabetic, decimal digits, - _ . ! ~ * ' ( )

If you wish to use an encoding compatible with RFC 3986 (which reserves !, ', (, ), and *), you can use:

function rfc3986EncodeURIComponent (str) {       return encodeURIComponent(str).replace(/[!'()*]/g, escape);   } 

You can get more information on this on MDN.


To answer your question, on why ' and the other chars mentioned above are not encoded by encodeURIComponent, the short answer is that they only need to be encoded in certain URI schemes and the decision to encode them depends on the scheme you're using.

To quote RFC 3986:

URI producing applications should percent-encode data octets that correspond to characters in the reserved set unless these characters are specifically allowed by the URI scheme to represent data in that component. If a reserved character is found in a URI component and no delimiting role is known for that character, then it must be interpreted as representing the data octet corresponding to that character's encoding in US-ASCII.

Where "reserved set" is defined as

reserved    = gen-delims / sub-delims gen-delims  = ":" / "/" / "?" / "#" / "[" / "]" / "@" sub-delims  = "!" / "$" / "&" / "'" / "(" / ")"             / "*" / "+" / "," / ";" / "=" 

Apostrophe is in the sub-delims group. In other words, you must leave these characters unencoded expecially if you are sure that consuming applications will know what to do with them: for example if you mistakenly encoded ? and & they will no longer delimit query parts. Historically there were also proposal for path segments parameters delimited with ; and , (didn't get large adoption), so these characters are also still allowed,. It is not that apostrohe is "free to use" (ie unreserved) in URI data, but that it was assumed it will have some special meaning in the URI context, for example the segment part:

segment       = *pchar pchar         = unreserved / pct-encoded / sub-delims / ":" / "@" unreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~" 
try this

encodeURIComponent(str).replace(/'/g, "%27"); 

The /char/g syntax tells JavaScript to replace all occurrences in your string

